vlieven opened a new issue #18501: URL: https://github.com/apache/airflow/issues/18501
### Apache Airflow version 2.1.2 ### Operating System Debian 10 ### Versions of Apache Airflow Providers apache-airflow-providers-apache-spark ### Deployment Other Docker-based deployment ### Deployment details Deployed on AWS EKS (Kubernetes version 1.21), backed by RDS database (Postgres API) Using Kubernetes Executor ### What happened I cleared +- 2000 runs of the same DAG in order to reprocess a dataset. This caused 10.000+ tasks to switch to the status "none". The amount of tasks in this DAG which is allowed to run is fairly limited (task_concurrency=2 for the tasks, max_active_runs=3 for the DAG). These limits seem to be honoured, and no excessive amount of tasks are being scheduled by this DAG. However, what happened is that other DAGs were prevented from running their tasks. Given that no task limits were being hit, I suspect that this has to do with the 10.000 tasks with status "none" keeping the scheduler over-occupied, leading to no useful work actually getting scheduled. ### What you expected to happen I expected the backfilling process not to block other DAGs from scheduling tasks. Potentially by having the scheduler ignoring tasks which violate the `max_active_runs ` limit. ### How to reproduce - Take a DAG with 2000 DagRuns and 5+ tasks per run - Set the state of all 2000 DagRuns to cleared - Observe starvation of other DAGs trying to schedule tasks concurrently with the backfilling ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
