ldacey commented on issue #9975: URL: https://github.com/apache/airflow/issues/9975#issuecomment-844122190
This issue impacted me as well recently (I cleared 450 historical tasks to reprocess data) Is there anyway to improve how the scheduler becomes deadlocked if you clear a lot of tasks? In my case on 2.0.2: 1) None of my DAGs would actually complete even if the tasks were all successful 2) Many of my DAGs would not start at all The root cause of this was of course clearing 450 tasks all at once, but I had depends_on_past=True and concurrency=1 enabled and each tasks took 20 minutes to complete so I was stuck. This was exacerbated by all other DAGs failing to complete (the state would be running even if the tasks were complete). I ended up having to mark all of those DAGs successful in the UI, then I had to restart Airflow in order for DAG runs to be marked complete again. I will refrain from clearing so many tasks at once next time, but perhaps Airflow could handle this situation better? Maybe a "queued" state at a DAG-level, and the DAG would only be considered running if one or more tasks were running? A deadlock is frustrating to deal with. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
