andreahlert opened a new pull request, #61627: URL: https://github.com/apache/airflow/pull/61627
Fixes: #58936 ## Summary When a Kubernetes worker pod receives SIGTERM (e.g. spot interruption, scaling down, rolling update), the signal is delivered to the supervisor process (PID 1 in the container). Previously, the supervisor had no signal handler installed and would exit with default behavior, leaving the task subprocess orphaned without ever calling the operator's `on_kill()` hook. This meant spawned resources (pods, subprocesses, etc.) were never cleaned up. **Root cause**: The `supervise()` function starts the task subprocess and calls `process.wait()`, but never installs signal handlers for SIGTERM/SIGINT. The task subprocess *does* have a SIGTERM handler (registered in `task_runner.py`) that calls `on_kill()`, but the signal never reaches it because the supervisor process terminates first. **Fix**: Install SIGTERM/SIGINT signal handlers in `supervise()` that forward the received signal to the task subprocess via `os.kill()`. The child's existing handler then calls `on_kill()` as expected, restoring the Airflow 2 behavior. **Signal flow after fix**: 1. K8s sends SIGTERM to supervisor (PID 1) 2. Supervisor's new handler forwards SIGTERM to task subprocess 3. Task subprocess's existing `_on_term` handler calls `operator.on_kill()` 4. Operator cleans up resources (pods, subprocesses, etc.) 5. Subprocess exits, supervisor's `wait()` returns normally ## Changes - **`task-sdk/src/airflow/sdk/execution_time/supervisor.py`**: Added signal forwarding in `supervise()` function. Signal handlers are saved, installed before `process.wait()`, and restored in a `finally` block. - **`task-sdk/tests/task_sdk/execution_time/test_supervisor.py`**: Added test that verifies SIGTERM forwarding from supervisor to subprocess triggers the operator's `on_kill()` hook. ## Test plan - [ ] New test `test_on_kill_hook_called_when_supervisor_receives_sigterm` verifies the signal forwarding chain - [ ] Existing `test_on_kill_hook_called_when_sigkilled` still passes (no regression) - [ ] Existing signal-related tests (`test_kill_escalation_path`, `test_exit_by_signal`) still pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
