collinmcnulty opened a new issue, #49508:
URL: https://github.com/apache/airflow/issues/49508

   ### Apache Airflow version
   
   2.10.5
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   Two DAGs each receive a large batch of DAG Runs. The number of runs for each 
DAG exceeds `max_dagruns_per_loop_to_schedule`. Each DAG run is very short, 
shorter than the heartrate of this Airflow deployment. Both DAGs have a 
max_active_runs that is far less than dagruns_per_loop.
   
   So: max_active_runs < max_dagruns_per_loop_to_schedule < number of queued 
DAG runs.
   
   Each scheduler loop, there are a very small number of DAG Run "slots" for 
the first DAG, so the check `coalesce(running_drs.c.num_running, text("0")) < 
coalesce(Backfill.max_active_runs, DagModel.max_active_runs),` does not apply. 
But then all the DAG runs that are considered are from the first DAG. So Second 
DAG effectively has to wait for nearly all of First DAG's runs to complete 
before any of its runs are moved from queued to running.
   
   ### What you think should happen instead?
   
   I think the "most correct" thing to do is to change the global yes/no for a 
DAG being included in the check on the basis of max_active_runs to some kind of 
limit on the number for that DAG that can be included. I can't see a good way 
to do this in SQL but others may have insight.
   
   Alternatively, because this is predominantly a problem when a single DAG 
dominates the scheduler's attention, we could add an explicit check to see if 
the result of the DAG run query contains only the a single DAG, and if so 
re-run the query with that DAG excluded.
   
   ### How to reproduce
   
   1. Create two DAGs with a single, simple task.
   2. Set max_active_runs=100
   3. Set max_dagruns_per_loop_to_schedule=2000
   4. Start 5000 Runs of the first DAG
   5. Start 5000 Runs of the second DAG
   6. Hard to reproduce: keep the heartrate of the scheduler low enough that 
Runs complete within one scheduler loop.
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to