Vijay Krishna Ramesh created AIRFLOW-931:
--------------------------------------------

             Summary: LocalExecutor fails to run queued task with race condition
                 Key: AIRFLOW-931
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-931
             Project: Apache Airflow
          Issue Type: Bug
    Affects Versions: Airflow 1.8, 1.8.0rc4
            Reporter: Vijay Krishna Ramesh


https://gist.github.com/vijaykramesh/707262c83429ab2a3f5ee701879813e3 provides 
a small example that consistently hits this problem with LocalExecutor.

Basically when the dag run kicks off (with concurrency > 1) and a LocalExecutor 
with parallelism > 2 the scheduler marks more than concurrency tasks as queued 
(https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L1095)

There is a second check before actually running the task 
(https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1291)
 that leaves the task in the QUEUED state but then the scheduler never picks it 
back up.  This causes the DAG to get stuck (as the queued tasks never run) 
until the scheduler is restarted (at which point the enqueued tasks are 
considered orphaned, the status is set to NONE, and then they are picked up by 
the scheduler again and run.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to