amoghrajesh opened a new issue, #67168:
URL: https://github.com/apache/airflow/issues/67168
## Context
PR #67118 adds sync resumable support to `SparkSubmitOperator` via
`ResumableJobMixin`. During review, @ashb suggested also supporting a
`deferrable=True` mode in the same operator.
## What this issue tracks
Add `deferrable: bool = False` parameter to `SparkSubmitOperator`:
- `deferrable=False` (default): sync path with `ResumableJobMixin` — worker
slot occupied during polling, reconnects to existing driver on infrastructure
failure (implemented in #67118)
- `deferrable=True`: submit job, `defer()` to `SparkDriverTrigger`, worker
slot freed during polling. When `execute()` is called again (only happens on
user clear), resubmit fresh — no reconnect needed since crashes are handled by
Trigger row persistence.
## What's needed
1. `SparkDriverTrigger` — polls Spark REST API async until driver reaches
terminal state
- Standalone: `GET http://master:6066/v1/submissions/status/{driver_id}`
via `aiohttp`
- YARN: `GET http://rm:8088/ws/v1/cluster/apps/{app_id}` (when YARN adds
`_should_track_driver_status=True`)
- K8s: k8s pod phase API (when K8s adds
`_should_track_driver_status=True`)
2. `deferrable` parameter on operator + `on_driver_finished()` callback
3. Tests
## Relationship to #67118
The two modes share `spark_job_id` in `task_state`. A user can switch from
sync to deferrable without any state migration.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]