kiranphagura opened a new issue, #27669: URL: https://github.com/apache/airflow/issues/27669
### Apache Airflow version Other Airflow 2 version (please specify below) ### What happened (v2.2.5) When using the AzureBatchOperator, we are trying to run 10 parallel tasks in Airflow, each on 10 different nodes in an Azure Batch pool. Our Azure Batch Pool has been set up to run 1 task per node, so the 10 tasks are spread evenly across the nodes in the pool. We have tested running the tasks in our Batch pool without using Airflow - we can confirm this runs as expected (10 tasks running in parallel, with 1 task being run on each node in the pool - all nodes running at the same time). However when using the operator in Airflow, we find that the tasks do not run in parallel. A single task will run on 1 node, while the other 9 tasks wait until the first task is complete, then it will run. The message we see on the task logs in Airflow is: waiting for all nodes in the pool to reach a state of: start_task_failed, unusable, or idle'. (Also seen in the Airflow source code for the operator). This would suggest that in order for the next task to run, all nodes will need to stop running and become idle again. I believe there should be an extra node state in here for running nodes. I.e. even if some nodes in the pool are running, the next task can see execute on another node in the pool and run in parallel. Reference: https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/_modules/airflow/providers/microsoft/azure/operators/batch.html#AzureBatchOperator ### What you think should happen instead In doc: https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/_modules/airflow/providers/microsoft/azure/operators/batch.html#AzureBatchOperator, I believe another state should be added: self.hook.wait_for_all_node_state( self.batch_pool_id, { batch_models.ComputeNodeState.start_task_failed, batch_models.ComputeNodeState.unusable, batch_models.ComputeNodeState.idle, batch_models.ComputeNodeState.running, }, ) With the current set up of this operator, it is impossible to get all tasks to run in parallel in Azure Batch. ### How to reproduce Create a DAG with multiple parallel tasks, all using the AzureBatchOperator - ensuring the Batch pool set up in Azure is set to a 'Spread' configuration. ### Operating System Linux ### Versions of Apache Airflow Providers v2.2.5 ### Deployment Official Apache Airflow Helm Chart ### Deployment details Stable Airflow helm chart on Azure Kubernetes Cluster : https://airflow-helm.github.io/charts ### Anything else Issue occurs in each run. ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
