manipatnam opened a new pull request, #62401:
URL: https://github.com/apache/airflow/pull/62401

   ## Description
   
   Currently, when a deferred KubernetesPodOperator task is manually marked as 
success/failed from the Airflow UI, the associated Kubernetes pod is never 
cleaned up — it keeps running as an orphan. This is because
   `KubernetesPodTrigger` has no `cleanup()` override, so trigger cancellation 
is a no-op.
   
   This PR fixes the deferrable-mode cleanup gap and adds a `cancel_on_kill` 
parameter (default `True`) to give users explicit control over pod deletion on 
kill in both sync and deferrable modes.
   
   ### Changes
   
   **KubernetesPodOperator** (`operators/pod.py`):
   - New `cancel_on_kill` parameter (default `True`). In sync mode, guards 
`on_kill()` to skip pod deletion when `False`. In deferrable mode, forwarded to 
the trigger.
   - Passes `termination_grace_period` to the trigger for graceful shutdown.
   
   **KubernetesPodTrigger** (`triggers/pod.py`):
   - New `cancel_on_kill` and `termination_grace_period` parameters.
   - Implements `cleanup()` to delete the pod when the trigger is cancelled 
(e.g. user marks deferred task as success/failed), respecting `cancel_on_kill`, 
`on_finish_action`, and `safe_to_cancel()`.
   - Implements `safe_to_cancel()` by querying actual task state (execution API 
on 3.0+, DB on 2.x) to distinguish user initiated kills from triggerer 
rebalancing. Pods are preserved during rebalancing.
   - Defensive error handling: if task state cannot be determined during 
cleanup, pod deletion is skipped (fail-safe) with a warning log.
   - Tracks `_fired_event` flag to short-circuit cleanup after normal 
completion without an API/DB call.
   
   **AsyncKubernetesHook** (`hooks/kubernetes.py`):
   - `delete_pod` accepts optional `grace_period_seconds` for graceful
     termination.
   
   **Tests** (`triggers/test_pod.py`):
   - Tests for all cleanup paths: fired event, cancel_on_kill=False, triggerer 
rebalancing, manual mark, keep_pod.
   - Tests for `safe_to_cancel()` with DEFERRED and non-DEFERRED states.
   
   **Docs** (`operators.rst`):
   - New "Pod cleanup on kill" section documenting `cancel_on_kill` behavior in 
both sync and deferrable modes.
   
   closes: #62398 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to