cruseakshay opened a new pull request, #60626: URL: https://github.com/apache/airflow/pull/60626
## Problem When using `KubernetesPodOperator` with cluster autoscaling, task pods can be preempted by higher-priority daemonsets during node bootstrap. This results in a 404 error when Airflow tries to read the pod status, causing immediate task failure instead of allowing Kubernetes to reschedule the pod. Fixes #59626 ## Solution Introduce **state-aware retry logic** that tracks whether a pod ever reached the `Running` state: - **Pod never reached Running** → Raise `PodPreemptedException` (retriable) - **Pod was Running** → Raise `PodNotFoundException` (terminal failure) This prevents duplicate execution of non-idempotent tasks while allowing safe retries for pods preempted before they started. ## Changes | File | Change | |------|--------| | `exceptions.py` | Add `PodNotFoundException` and `PodPreemptedException` | | `pod_manager.py` | Add `PodPhaseTracker` dataclass and 404 handling logic | | `hooks/kubernetes.py` | Add phase tracker support to async `get_pod()` | | `triggers/pod.py` | Integrate phase tracking in `KubernetesPodTrigger` | | `kubernetes_helper_functions.py` | Add `PodPreemptedException` to retry logic | | `test_pod_manager.py` | Add comprehensive tests for new functionality | ## Testing - Unit tests for `PodPhaseTracker` state transitions - Unit tests for 404 handling with different pod states - Backward compatibility tests (no tracker = existing behavior) --- ##### Was generative AI tooling used to co-author this PR? - [X] Yes (please specify the tool below) Generated-by: Claude (Cursor) following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
