jason810496 opened a new issue, #68505:
URL: https://github.com/apache/airflow/issues/68505

   # Refactor `SparkSubmitOperator` resumable backends into separate 
methods/classes
   
   ## Summary
   
   The `SparkSubmitOperator` `ResumableJobMixin` implementation now supports 
three
   deployment backends (Spark standalone driver-status tracking, YARN cluster 
mode,
   and Kubernetes driver-pod tracking). Each mixin method branches on the 
backend
   inline, so per-backend logic is scattered across many methods instead of 
living
   in one place per backend. This issue tracks decoupling them.
   
   ## Background
   
   Resumability for `SparkSubmitOperator` landed incrementally:
   
   - #67118 — standalone Spark
   - #67473 — YARN
   - #67715 — K8s driver tracking building block
   - #68067 — wires the K8s path into `ResumableJobMixin`
   
   During review of #68067, the refactor was raised as a non-blocking idea and 
the
   author agreed to follow up after the 3.3.0 release
   (https://github.com/apache/airflow/pull/68067#issuecomment-4698329231).
   
   ## Problem
   
   In 
`providers/apache/spark/src/airflow/providers/apache/spark/operators/spark_submit.py`,
   the `ResumableJobMixin` methods each carry their own backend branching:
   
   - `submit_job`
   - `get_job_status`
   - `is_job_active`
   - `is_job_succeeded`
   - `poll_until_complete`
   - `on_kill`
   
   Every method repeats `if self._hook._is_yarn_cluster_mode: ... if 
self._hook._is_kubernetes: ... else (standalone)`.
   A single backend's behaviour is therefore spread across six methods, which 
makes
   the flow hard to follow, easy to break when adding a backend, and awkward to 
test
   in isolation.
   
   ## Proposed change
   
   Separate each backend's logic so it is cohesive - for example a per-backend
   strategy/handler class (standalone / YARN / K8s) implementing a common 
interface
   (`submit_job`, `get_job_status`, `is_job_active`, `is_job_succeeded`,
   `poll_until_complete`, `on_kill`), with the operator selecting the handler 
based
   on deploy mode and tracking flags. A lighter alternative is grouping each
   backend's branch into dedicated private methods. Decide between the two 
during
   design.
   
   ## Acceptance criteria
   
   - Per-backend logic is cohesive (one class or one method group per backend), 
not
     interleaved across the mixin methods.
   - Backend selection happens once instead of being re-derived in every method.
   - Existing behaviour is unchanged; current tests pass and per-backend logic 
is
     unit-testable in isolation.
   - No public API change to `SparkSubmitOperator`.
   
   ## Notes
   
   - Non-breaking, internal refactor — target after the 3.3.0 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to