jason810496 commented on code in PR #68067:
URL: https://github.com/apache/airflow/pull/68067#discussion_r3386774109


##########
generated/provider_dependencies.json:
##########
@@ -1013,6 +1013,7 @@
       "http",
       "microsoft.azure",
       "microsoft.mssql",
+      "mongo",

Review Comment:
   The changes should be covered in 
https://github.com/apache/airflow/pull/68324.



##########
providers/apache/spark/src/airflow/providers/apache/spark/operators/spark_submit.py:
##########
@@ -397,8 +441,19 @@ def poll_until_complete(self, external_id: JsonValue, 
context: Context) -> None:
                 self._hook._run_post_submit_commands()
             return
         if self._hook._is_kubernetes:
-            # TODO: poll K8s pod phase until terminal
-            raise NotImplementedError("K8s poll not yet implemented")
+            if external_id is not None:
+                _, pod_name = str(external_id).split(":", 1)
+                self._hook._kubernetes_driver_pod = pod_name
+            self._hook._poll_k8s_driver_via_api()
+            # The driver pod is deleted on success, so cache the terminal 
phase before it
+            # disappears. Failed jobs raise before reaching here, so only 
"Succeeded" is ever
+            # cached. A missing key on retry means the pod was garbage 
collected after failure, and
+            # resubmitting fresh is the right behaviour in that case.
+            task_store = context.get("task_store")
+            if task_store is not None:

Review Comment:
   
   ```suggestion
               if (task_store := context.get("task_store")) is not None:
   ```



##########
providers/apache/spark/src/airflow/providers/apache/spark/operators/spark_submit.py:
##########
@@ -397,8 +441,19 @@ def poll_until_complete(self, external_id: JsonValue, 
context: Context) -> None:
                 self._hook._run_post_submit_commands()
             return
         if self._hook._is_kubernetes:
-            # TODO: poll K8s pod phase until terminal
-            raise NotImplementedError("K8s poll not yet implemented")
+            if external_id is not None:
+                _, pod_name = str(external_id).split(":", 1)
+                self._hook._kubernetes_driver_pod = pod_name
+            self._hook._poll_k8s_driver_via_api()
+            # The driver pod is deleted on success, so cache the terminal 
phase before it
+            # disappears. Failed jobs raise before reaching here, so only 
"Succeeded" is ever
+            # cached. A missing key on retry means the pod was garbage 
collected after failure, and
+            # resubmitting fresh is the right behaviour in that case.
+            task_store = context.get("task_store")
+            if task_store is not None:
+                task_store.set(self._K8S_DRIVER_STATUS_KEY, "Succeeded")

Review Comment:
   `_poll_k8s_driver_via_api()` returns without raising on **two** different 
paths: the genuine `Succeeded` break, and the pre-existing 404 branch that 
exits when the driver pod has vanished (e.g. deleted by `on_kill`).
   
   This block treats *any* non-raising return as success and unconditionally 
caches `"Succeeded"`, so a killed/vanished pod gets recorded as succeeded, and 
on the next retry `get_job_status` reads that cached `"Succeeded"` first and 
reports the task successful without ever resubmitting.
   
   Suggest gating the cache on an actually-observed phase: have 
`_poll_k8s_driver_via_api()` return the terminal phase, `task_store.set(..., 
"Succeeded")` only when it equals `"Succeeded"`, and return `None` on the 
404/vanished path so nothing is cached (the retry then re-queries and 
resubmits, which is correct).



##########
providers/apache/spark/src/airflow/providers/apache/spark/operators/spark_submit.py:
##########
@@ -397,8 +441,19 @@ def poll_until_complete(self, external_id: JsonValue, 
context: Context) -> None:
                 self._hook._run_post_submit_commands()
             return
         if self._hook._is_kubernetes:
-            # TODO: poll K8s pod phase until terminal
-            raise NotImplementedError("K8s poll not yet implemented")
+            if external_id is not None:
+                _, pod_name = str(external_id).split(":", 1)
+                self._hook._kubernetes_driver_pod = pod_name

Review Comment:
   How about adding a small utility for parsing the external_id in k8s mode? So 
that we could ensure the split and validation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to