SameerMesiah97 opened a new pull request, #66815: URL: https://github.com/apache/airflow/pull/66815
**Description** This change introduces deferrable execution support for `AzureBatchOperator`. Previously, the operator only supported synchronous execution, where a worker process continuously polled Azure Batch until all tasks in the job reached a terminal state. This update allows the operator to defer while waiting for Batch job completion. When `deferrable=True`, the operator creates the pool, waits for nodes to become ready, creates the job and task, and then defers execution to a new `AzureBatchTrigger`. The trigger polls the Batch job state and emits a completion event when all tasks reach a terminal state or when the configured timeout is exceeded. The operator resumes in `execute_complete()` to log success or raise an exception if the Batch job failed or timed out. The implementation preserves the existing synchronous pool provisioning behavior while offloading long-running Batch job monitoring to the triggerer process. **Rationale** Azure Batch jobs can run for extended periods depending on workload size, VM provisioning time, and task execution duration. In synchronous mode the operator continuously polls Batch job state from a worker process, which can block worker slots for the duration of the run. Deferrable execution offloads this polling to the triggerer process, allowing the task to suspend while waiting for completion. **Pool provisioning and node readiness checks remain synchronous. While node provisioning may take several minutes, it is typically short relative to overall Batch job execution time. Fully deferring pool provisioning would require additional trigger complexity for relatively limited worker-slot savings.** This change also maintains consistency with other long-running operators in the Azure provider which support deferrable execution semantics. Adding deferrable support to `AzureBatchOperator` ensures that long-running Azure Batch workloads can be monitored efficiently without unnecessarily consuming worker resources. **Tests** Added operator tests verifying that: * The operator defers execution when `deferrable=True`. * `execute_complete()` logs success when the trigger reports successful completion. * `execute_complete()` raises `RuntimeError` when the trigger reports timeout or failure conditions. * `execute_complete()` raises `RuntimeError` when no event or an unexpected event payload is received. Added trigger tests verifying that: * The trigger serializes correctly with the expected classpath and arguments. * `_build_trigger_event()` returns the correct event for successful and failed Batch task states and `None` for non-terminal states. * The trigger emits success and failure events when terminal Batch task states are reached during polling. * Timeout conditions emit timeout events correctly. * Exceptions raised during polling result in an error event. **Example DAGs** An example DAG has been added demonstrating `AzureBatchOperator` running in deferrable mode. **Documentation** Docstrings for `AzureBatchOperator` and `AzureBatchTrigger` have been updated to document deferrable execution, trigger behavior, and the `deferrable` and `poll_interval` parameters. **Backwards Compatibility** Deferrable execution is opt-in and controlled by the `deferrable` parameter, which defaults to `False`. Existing DAGs continue to use synchronous execution unless `deferrable=True` is explicitly enabled. **Acknowledgements** This change builds on the earlier deferrable implementation work explored in PR #59798 by @Ankurdeewan. Closes: #59779 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
