I have a large DAG (32 tasks) with concurrency=2 and max_active_runs=1. Most of the tasks also use a redshift_pool, and this is running the LocalExecutor on 1.8.0RC4.
When the DAG kicks off things seem to generally function, but a few of the tasks get moved to queued status (appropriately) but then never actually start. Looking in the logs I see: [2017-02-27 13:20:10,349] {base_task_runner.py:95} INFO - Subtask: [2017-02-27 13:20:10,348] {models.py:1128} INFO - Dependencies all met for <TaskInstance: etl_queries_v3.a_user_day_v2_query 2017-02-26 07:00:00 [queued]> [2017-02-27 13:20:10,356] {base_task_runner.py:95} INFO - Subtask: [2017-02-27 13:20:10,356] {models.py:1122} INFO - Dependencies not met for <TaskInstance: etl_queries_v3.a_user_day_v2_query 2017-02-26 07:00:00 [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum number of running tasks (etl_queries_v3) for this task's DAG '2' has been reached. [2017-02-27 13:20:14,444] {jobs.py:2062} INFO - Task exited with return code 0 and then that's it, the queued task never is picked up again. It has been different tasks each day, which makes me suspect it's some sort of scheduling race condition. And because they are enqueued not failed, the DAG run never finishes (and so this morning our DAG didn't kick off because yesterday's was still technically "running"). Any thoughts/advice? (I also added https://github.com/apache/incubator-airflow/pull/2109 to fix the formatting of that error message) Thanks, - Vijay Ramesh