vlieven opened a new issue #18501:
URL: https://github.com/apache/airflow/issues/18501


   ### Apache Airflow version
   
   2.1.2
   
   ### Operating System
   
   Debian 10
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-apache-spark
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   Deployed on AWS EKS (Kubernetes version 1.21), backed by RDS database 
(Postgres API)
   Using Kubernetes Executor
   
   ### What happened
   
   I cleared +- 2000 runs of the same DAG in order to reprocess a dataset. This 
caused 10.000+ tasks to switch to the status "none". The amount of tasks in 
this DAG which is allowed to run is fairly limited (task_concurrency=2 for the 
tasks, max_active_runs=3 for the DAG). These limits seem to be honoured, and no 
excessive amount of tasks are being scheduled by this DAG.
   
   However, what happened is that other DAGs were prevented from running their 
tasks. Given that no task limits were being hit, I suspect that this has to do 
with the 10.000 tasks with status "none" keeping the scheduler over-occupied, 
leading to no useful work actually getting scheduled.
   
   ### What you expected to happen
   
   I expected the backfilling process not to block other DAGs from scheduling 
tasks. Potentially by having the scheduler ignoring tasks which violate the 
`max_active_runs ` limit.
   
   ### How to reproduce
   
   - Take a DAG with 2000 DagRuns and 5+ tasks per run
   - Set the state of all 2000 DagRuns to cleared
   - Observe starvation of other DAGs trying to schedule tasks concurrently 
with the backfilling
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to