kiranphagura opened a new issue, #27669:
URL: https://github.com/apache/airflow/issues/27669

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   (v2.2.5)
   When using the AzureBatchOperator, we are trying to run 10 parallel tasks in 
Airflow, each on 10 different nodes in an Azure Batch pool. Our Azure Batch 
Pool has been set up to run 1 task per node, so the 10 tasks are spread evenly 
across the nodes in the pool.
   We have tested running the tasks in our Batch pool without using Airflow - 
we can confirm this runs as expected (10 tasks running in parallel, with 1 task 
being run on each node in the pool - all nodes running at the same time).
   
   However when using the operator in Airflow, we find that the tasks do not 
run in parallel. A single task will run on 1 node, while the other 9 tasks wait 
until the first task is complete, then it will run. The message we see on the 
task logs in Airflow is: waiting for all nodes in the pool to reach a state of: 
start_task_failed, unusable, or idle'. (Also seen in the Airflow source code 
for the operator). This would suggest that in order for the next task to run, 
all nodes will need to stop running and become idle again. I believe there 
should be an extra node state in here for running nodes. I.e. even if some 
nodes in the pool are running, the next task can see execute on another node in 
the pool and run in parallel. 
   
   Reference: 
https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/_modules/airflow/providers/microsoft/azure/operators/batch.html#AzureBatchOperator
   
   ### What you think should happen instead
   
   In doc: 
https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/_modules/airflow/providers/microsoft/azure/operators/batch.html#AzureBatchOperator,
 I believe another state should be added:
           self.hook.wait_for_all_node_state(
               self.batch_pool_id,
               {
                   batch_models.ComputeNodeState.start_task_failed,
                   batch_models.ComputeNodeState.unusable,
                   batch_models.ComputeNodeState.idle,
                   batch_models.ComputeNodeState.running,
               },
           )
   
   
   With the current set up of this operator, it is impossible to get all tasks 
to run in parallel in Azure Batch.
   
   ### How to reproduce
   
   Create a DAG with multiple parallel tasks, all using the AzureBatchOperator 
- ensuring the Batch pool set up in Azure is set to a 'Spread' configuration. 
   
   ### Operating System
   
   Linux
   
   ### Versions of Apache Airflow Providers
   
   v2.2.5
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Stable Airflow helm chart on Azure Kubernetes Cluster : 
https://airflow-helm.github.io/charts
   
   ### Anything else
   
   Issue occurs in each run.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to