Re: [PR] Make Spark driver reattachment deterministic when multiple pods match [airflow]

via GitHub Mon, 26 Jan 2026 12:29:29 -0800


SameerMesiah97 commented on PR #60717:
URL: https://github.com/apache/airflow/pull/60717#issuecomment-3801614360


   > If the driver pod fails, why can't we check the status in the query 
instead of doing a fallback? I see you are checking for this, and it seems to 
be pretty deterministic, we cannot have 2 running driver pods in this case, so 
it seems like this cannot happen, as no 2 driver pods will be running for the 
same SparkApplication, that is something k8s promises. The point is that you 
added the check for running, which is already deterministic, and so you may 
remove some code as there will never be more than 1 pod in running with the 
same labels for the same SparkApplication Maybe I am missing something but this 
seems to be the better and less error-prone solution
   
   I agree that under normal operation, there should be at most one active 
driver pod at a time.
   
   But the reason for keeping the fallback ordering is that this code operates 
on the Kubernetes API response rather than the SparkApplication controller’s 
intent. In abnormal states (e.g. termination delays or stale API observations), 
multiple pods can still be returned and phase alone might not be enough to pick 
up the correct pod. That being said, if you feel the additional ordering is 
unnecessary, I can remove it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Make Spark driver reattachment deterministic when multiple pods match [airflow]

Reply via email to