SameerMesiah97 commented on PR #60717: URL: https://github.com/apache/airflow/pull/60717#issuecomment-3801614360
> If the driver pod fails, why can't we check the status in the query instead of doing a fallback? I see you are checking for this, and it seems to be pretty deterministic, we cannot have 2 running driver pods in this case, so it seems like this cannot happen, as no 2 driver pods will be running for the same SparkApplication, that is something k8s promises. The point is that you added the check for running, which is already deterministic, and so you may remove some code as there will never be more than 1 pod in running with the same labels for the same SparkApplication Maybe I am missing something but this seems to be the better and less error-prone solution I agree that under normal operation, there should be at most one active driver pod at a time. But the reason for keeping the fallback ordering is that this code operates on the Kubernetes API response rather than the SparkApplication controller’s intent. In abnormal states (e.g. termination delays or stale API observations), multiple pods can still be returned and phase alone might not be enough to pick up the correct pod. That being said, if you feel the additional ordering is unnecessary, I can remove it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
