kaxil commented on code in PR #67118:
URL: https://github.com/apache/airflow/pull/67118#discussion_r3270207194
##########
task-sdk/src/airflow/sdk/__init__.py:
##########
@@ -65,6 +65,7 @@
"ProductMapper",
"RetryAction",
"RetryDecision",
+ "ResumableJobMixin",
Review Comment:
`"ResumableJobMixin"` is out of alphabetical order in `__all__` -- it sits
between `RetryDecision` and `RetryPolicy`, but should be just after
`ProductMapper` and before `RetryAction`. Same issue at line 227 in the lazy
import dict.
##########
providers/apache/spark/src/airflow/providers/apache/spark/operators/spark_submit.py:
##########
@@ -198,8 +221,63 @@ def execute(self, context: Context) -> None:
self.conf =
inject_transport_information_into_spark_properties(self.conf, context)
if self._hook is None:
self._hook = self._get_hook()
+ if self._hook._should_track_driver_status:
+ return self.execute_resumable(context)
self._hook.submit(self.application)
+ def submit_job(self, context: Context) -> str:
+ driver_id = self._hook.submit(self.application)
+ if not driver_id:
+ raise RuntimeError("spark-submit did not return a driver ID")
+ self.log.info("Spark driver submitted: %s", driver_id)
+ return driver_id
+
+ def get_job_status(self, external_id: str) -> str:
+ if self._hook._is_yarn:
Review Comment:
The YARN and Kubernetes branches in `get_job_status` / `is_job_active` /
`is_job_succeeded` / `poll_until_complete` are unreachable from this PR:
`execute_resumable` is only called when `_should_track_driver_status` is True,
which is only set for `spark://` + cluster mode. YARN and K8s never enter this
code path.
Reads as live branches a reviewer must verify, but they can never execute.
Suggest either dropping them (cleaner -- add them alongside the routing change
in the follow-up PR) or marking them clearly as scaffolding with a reference to
the follow-up PR number.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]