Nataneljpwd commented on code in PR #61110:
URL: https://github.com/apache/airflow/pull/61110#discussion_r2733584088


##########
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py:
##########
@@ -248,23 +247,28 @@ def find_spark_job(self, context, exclude_checked: bool = 
True):
             self._build_find_pod_label_selector(context, 
exclude_checked=exclude_checked)
             + ",spark-role=driver"
         )
-        pod_list = self.client.list_namespaced_pod(self.namespace, 
label_selector=label_selector).items
+        # since we did not specify a resource version, we make sure to get the 
latest data
+        # we make sure we get only running or pending pods.
+        field_selector = self._get_field_selector()
+        pod_list = self.client.list_namespaced_pod(
+            self.namespace, label_selector=label_selector, 
field_selector=field_selector
+        ).items
 
         pod = None
         if len(pod_list) > 1:
             # When multiple pods match the same labels, select one 
deterministically,
-            # preferring a Running pod, then creation time, with name as a 
tie-breaker.
+            # preferring a Running or Pending pod, as if another pod was 
created, it will be in either the
+            # terminating status or a terminal phase, if it is in terminating, 
it will have a
+            # deletion_timestamp.
+            # pending pods need to also be selected, as what if a driver pod 
just failed and a new pod is
+            # created, we do not want the task to fail.
             pod = max(
                 pod_list,
-                key=lambda p: (
-                    p.status.phase == PodPhase.RUNNING,
-                    p.metadata.creation_timestamp or 
datetime.min.replace(tzinfo=timezone.utc),
-                    p.metadata.name or "",
-                ),
+                key=lambda p: (p.metadata.deletion_timestamp is None, 
p.metadata.name or ""),
             )

Review Comment:
   a quorum read is triggered every time if I do not set a resourceVersion nor 
do I set a resourceVersionMatch, it automatically triggers a quorum read
   
   > Unless you have strong consistency requirements, using 
resourceVersionMatch=NotOlderThan and a known resourceVersion is preferable 
since it can achieve better performance and scalability of your cluster than 
leaving resourceVersion and resourceVersionMatch unset, which requires quorum 
read to be served.
   
   as is written in 
[here](https://kubernetes.io/docs/reference/using-api/api-concepts/)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to