manipatnam opened a new pull request, #62401:
URL: https://github.com/apache/airflow/pull/62401
## Description
Currently, when a deferred KubernetesPodOperator task is manually marked as
success/failed from the Airflow UI, the associated Kubernetes pod is never
cleaned up — it keeps running as an orphan. This is because
`KubernetesPodTrigger` has no `cleanup()` override, so trigger cancellation
is a no-op.
This PR fixes the deferrable-mode cleanup gap and adds a `cancel_on_kill`
parameter (default `True`) to give users explicit control over pod deletion on
kill in both sync and deferrable modes.
### Changes
**KubernetesPodOperator** (`operators/pod.py`):
- New `cancel_on_kill` parameter (default `True`). In sync mode, guards
`on_kill()` to skip pod deletion when `False`. In deferrable mode, forwarded to
the trigger.
- Passes `termination_grace_period` to the trigger for graceful shutdown.
**KubernetesPodTrigger** (`triggers/pod.py`):
- New `cancel_on_kill` and `termination_grace_period` parameters.
- Implements `cleanup()` to delete the pod when the trigger is cancelled
(e.g. user marks deferred task as success/failed), respecting `cancel_on_kill`,
`on_finish_action`, and `safe_to_cancel()`.
- Implements `safe_to_cancel()` by querying actual task state (execution API
on 3.0+, DB on 2.x) to distinguish user initiated kills from triggerer
rebalancing. Pods are preserved during rebalancing.
- Defensive error handling: if task state cannot be determined during
cleanup, pod deletion is skipped (fail-safe) with a warning log.
- Tracks `_fired_event` flag to short-circuit cleanup after normal
completion without an API/DB call.
**AsyncKubernetesHook** (`hooks/kubernetes.py`):
- `delete_pod` accepts optional `grace_period_seconds` for graceful
termination.
**Tests** (`triggers/test_pod.py`):
- Tests for all cleanup paths: fired event, cancel_on_kill=False, triggerer
rebalancing, manual mark, keep_pod.
- Tests for `safe_to_cancel()` with DEFERRED and non-DEFERRED states.
**Docs** (`operators.rst`):
- New "Pod cleanup on kill" section documenting `cancel_on_kill` behavior in
both sync and deferrable modes.
closes: #62398
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]