qcha41 opened a new issue, #38954:
URL: https://github.com/apache/airflow/issues/38954

   ### Apache Airflow version
   
   2.9.0
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   When you launch a backfill command with the Airflow CLI, it will first 
trigger a batch of several dag runs in parallel (max number fixed by the 
parameter max_active_runs). The problem is that the backfill scheduler is 
waiting for all dag runs of that first batch to be finished before triggering a 
new batch of dag runs. 
   
   This means that if all dag runs are not taking the same time to complete, we 
are unnecessarily losing processing time waiting for the slowest dag run before 
triggering new ones, which has a serious impact on the overall backfill 
operation execution time.
   
   I don't understand why backfill jobs do not behave as when we clear mulitple 
tasks : as soon as a slot becomes free, the following pending task is triggered 
immediately.
   
   ### What you think should happen instead?
   
   Backfill slot allocation management should be the same as the classic 
"clear" slot allocation management, in order to trigger new dag runs as soon as 
a previous one is finished, to avoid losing time unecessarily.
   
   ### How to reproduce
   
   dag.py
   ```python
   from datetime import datetime, timedelta
   from random import randint
   from time import sleep
   from airflow.decorators import dag, task
   
   # Define the default arguments
   default_args = {
       'owner': 'airflow',
       'depends_on_past': False,
       'start_date': datetime(2025, 4, 11),
       'retries': 1,
       'retry_delay': timedelta(minutes=5),
   }
   
   # Define the Python function to wait for a random time
   @task
   def random_wait():
       wait_time = randint(1, 60)
       print(f"Waiting for {wait_time} seconds...")
       sleep(wait_time)
   
   # Define the DAG
   @dag(default_args=default_args, schedule_interval=timedelta(minutes=1), 
dag_id='random_wait_dag')
   def random_wait_dag():
       task_1 = random_wait()
       task_2 = random_wait()
       task_3 = random_wait()
   
       task_1 >> task_2 >> task_3
   
   dag = random_wait_dag()
   ```
   
   Start backfill
   ```bash
   docker compose run airflow-cli dags backfill -s 2024-04-11T00:00:00 -e 
2024-04-11T01:00:00 random_wait_dag
   ```
   
   Observe behaviour
   
![1](https://github.com/apache/airflow/assets/31767137/6370fca4-e5af-495f-b307-18fbc6b6bb0e)
   
![4](https://github.com/apache/airflow/assets/31767137/b2f13bb6-b2d4-4008-9e8d-8ef21b120434)
   
![6](https://github.com/apache/airflow/assets/31767137/7c7743b6-0174-4cc0-97dd-ed6b0c0ef95f)
   
![7](https://github.com/apache/airflow/assets/31767137/999c85fd-56a4-4717-b6ab-a375b1090b07)
   
![8](https://github.com/apache/airflow/assets/31767137/7630a98c-b2b4-4a48-a8a3-3480d32e4da0)
   
   
   ### Operating System
   
   debian
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to