collinmcnulty opened a new issue, #49508:
URL: https://github.com/apache/airflow/issues/49508
### Apache Airflow version
2.10.5
### If "Other Airflow 2 version" selected, which one?
_No response_
### What happened?
Two DAGs each receive a large batch of DAG Runs. The number of runs for each
DAG exceeds `max_dagruns_per_loop_to_schedule`. Each DAG run is very short,
shorter than the heartrate of this Airflow deployment. Both DAGs have a
max_active_runs that is far less than dagruns_per_loop.
So: max_active_runs < max_dagruns_per_loop_to_schedule < number of queued
DAG runs.
Each scheduler loop, there are a very small number of DAG Run "slots" for
the first DAG, so the check `coalesce(running_drs.c.num_running, text("0")) <
coalesce(Backfill.max_active_runs, DagModel.max_active_runs),` does not apply.
But then all the DAG runs that are considered are from the first DAG. So Second
DAG effectively has to wait for nearly all of First DAG's runs to complete
before any of its runs are moved from queued to running.
### What you think should happen instead?
I think the "most correct" thing to do is to change the global yes/no for a
DAG being included in the check on the basis of max_active_runs to some kind of
limit on the number for that DAG that can be included. I can't see a good way
to do this in SQL but others may have insight.
Alternatively, because this is predominantly a problem when a single DAG
dominates the scheduler's attention, we could add an explicit check to see if
the result of the DAG run query contains only the a single DAG, and if so
re-run the query with that DAG excluded.
### How to reproduce
1. Create two DAGs with a single, simple task.
2. Set max_active_runs=100
3. Set max_dagruns_per_loop_to_schedule=2000
4. Start 5000 Runs of the first DAG
5. Start 5000 Runs of the second DAG
6. Hard to reproduce: keep the heartrate of the scheduler low enough that
Runs complete within one scheduler loop.
### Operating System
Debian GNU/Linux 12 (bookworm)
### Versions of Apache Airflow Providers
_No response_
### Deployment
Astronomer
### Deployment details
_No response_
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]