I have a large DAG (32 tasks) with concurrency=2 and max_active_runs=1.
Most of the tasks also use a redshift_pool, and this is running the
LocalExecutor on 1.8.0RC4.

When the DAG kicks off things seem to generally function, but a few of the
tasks get moved to queued status (appropriately) but then never actually
start.  Looking in the logs I see:

[2017-02-27 13:20:10,349] {base_task_runner.py:95} INFO - Subtask:
[2017-02-27 13:20:10,348] {models.py:1128} INFO - Dependencies all met for
<TaskInstance: etl_queries_v3.a_user_day_v2_query 2017-02-26 07:00:00
[queued]>
[2017-02-27 13:20:10,356] {base_task_runner.py:95} INFO - Subtask:
[2017-02-27 13:20:10,356] {models.py:1122} INFO - Dependencies not met for
<TaskInstance: etl_queries_v3.a_user_day_v2_query 2017-02-26 07:00:00
[queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum
number of running tasks (etl_queries_v3) for this task's DAG '2' has been
reached.
[2017-02-27 13:20:14,444] {jobs.py:2062} INFO - Task exited with return
code 0

and then that's it, the queued task never is picked up again. It has been
different tasks each day, which makes me suspect it's some sort of
scheduling race condition.  And because they are enqueued not failed, the
DAG run never finishes (and so this morning our DAG didn't kick off because
yesterday's was still technically "running").

Any thoughts/advice? (I also added
https://github.com/apache/incubator-airflow/pull/2109 to fix the formatting
of that error message)

Thanks,
 - Vijay Ramesh

Reply via email to