Pranaykarvi opened a new pull request, #63915: URL: https://github.com/apache/airflow/pull/63915
## Problem When a Kubernetes Job retries (creates a new pod after a pod failure), `GKEJobTrigger` keeps tracking the original pod names set at trigger creation time. It waits for XCom on the failed pod instead of the new retry pod. This causes the XCom sidecar on the retry pod to never receive a termination signal, leaving the pod running until the job's `activeDeadlineSeconds` is exceeded and failing the task. ## Fix Before XCom extraction, re-discover all current pods for the job using the `job-name=<job_name>` label selector. Filter to only succeeded pods and extract XCom from those. Falls back to original pod list if no succeeded pods are found. Added `list_pods()` async method to `GKEKubernetesAsyncHook` to support pod discovery by label selector. ## Testing Added unit test `test_run_do_xcom_push_uses_succeeded_retry_pod_not_original_failed_pod` that verifies the trigger uses the succeeded retry pod for XCom extraction when the original pod failed. Fixes #63838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
