jason810496 opened a new issue, #68505: URL: https://github.com/apache/airflow/issues/68505
# Refactor `SparkSubmitOperator` resumable backends into separate methods/classes ## Summary The `SparkSubmitOperator` `ResumableJobMixin` implementation now supports three deployment backends (Spark standalone driver-status tracking, YARN cluster mode, and Kubernetes driver-pod tracking). Each mixin method branches on the backend inline, so per-backend logic is scattered across many methods instead of living in one place per backend. This issue tracks decoupling them. ## Background Resumability for `SparkSubmitOperator` landed incrementally: - #67118 — standalone Spark - #67473 — YARN - #67715 — K8s driver tracking building block - #68067 — wires the K8s path into `ResumableJobMixin` During review of #68067, the refactor was raised as a non-blocking idea and the author agreed to follow up after the 3.3.0 release (https://github.com/apache/airflow/pull/68067#issuecomment-4698329231). ## Problem In `providers/apache/spark/src/airflow/providers/apache/spark/operators/spark_submit.py`, the `ResumableJobMixin` methods each carry their own backend branching: - `submit_job` - `get_job_status` - `is_job_active` - `is_job_succeeded` - `poll_until_complete` - `on_kill` Every method repeats `if self._hook._is_yarn_cluster_mode: ... if self._hook._is_kubernetes: ... else (standalone)`. A single backend's behaviour is therefore spread across six methods, which makes the flow hard to follow, easy to break when adding a backend, and awkward to test in isolation. ## Proposed change Separate each backend's logic so it is cohesive - for example a per-backend strategy/handler class (standalone / YARN / K8s) implementing a common interface (`submit_job`, `get_job_status`, `is_job_active`, `is_job_succeeded`, `poll_until_complete`, `on_kill`), with the operator selecting the handler based on deploy mode and tracking flags. A lighter alternative is grouping each backend's branch into dedicated private methods. Decide between the two during design. ## Acceptance criteria - Per-backend logic is cohesive (one class or one method group per backend), not interleaved across the mixin methods. - Backend selection happens once instead of being re-derived in every method. - Existing behaviour is unchanged; current tests pass and per-backend logic is unit-testable in isolation. - No public API change to `SparkSubmitOperator`. ## Notes - Non-breaking, internal refactor — target after the 3.3.0 release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
