AndyN5 opened a new issue, #45840: URL: https://github.com/apache/airflow/issues/45840
### Apache Airflow version 2.10.4 ### If "Other Airflow 2 version" selected, which one? _No response_ ### What happened? The pull request addresses the addition of deferrable functionality to the DataprocJobBaseOperator and DataprocSubmitJobOperator in Apache Airflow. This change allows for asynchronous job submission and improved handling of long-running tasks within Google Cloud Dataproc. As a result, the operators are now more efficient and capable of deferring job execution, providing better resource management and flexibility in scheduling. Before this PR, the operators did not support deferrable capabilities, which meant that they could not handle longer-running jobs in a non-blocking way. This led to unnecessary blocking of resources and poor scalability for workflows that required long-running cloud tasks, such as large data processing jobs on Dataproc. ### What you think should happen instead? The updated DataprocSubmitJobOperator and DataprocJobBaseOperator should now allow asynchronous job submission, enabling long-running tasks like Dataproc jobs to run without blocking other tasks in the Airflow DAG. When the operator is used, it should return control to the workflow immediately while the job continues to run in the background. The operator should be able to properly defer job execution until it's ready to resume and track the status of the job. ### How to reproduce Set up an Airflow environment with the DataprocSubmitJobOperator and DataprocJobBaseOperator from the Google Cloud provider. Create a DAG that uses one of these operators to submit a Dataproc job (such as a Spark or Hadoop job) without the asynchronous capability. Run the DAG. Observe that the task execution blocks the workflow until the Dataproc job finishes, leading to inefficiency or task timeouts if the job runs for a long time. Now, apply the changes from the PR (DataprocSubmitJobOperator with async capability), which allows jobs to run asynchronously. The task should return control to the DAG immediately and defer until the job completes. Compare the behavior before and after the patch. The issue should manifest when trying to run long jobs without the asynchronous execution mode, where the task won't release control until the job completes. After applying the patch, the behavior should change with more efficient handling of long-running tasks. ### Operating System Windows ### Versions of Apache Airflow Providers _No response_ ### Deployment Official Apache Airflow Helm Chart ### Deployment details _No response_ ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
