[
https://issues.apache.org/jira/browse/AIRFLOW-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903330#comment-15903330
]
ASF subversion and git services commented on AIRFLOW-931:
---------------------------------------------------------
Commit e42398100a3248eddb6b511ade73f6a239e58090 in incubator-airflow's branch
refs/heads/master from [~bolke]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=e423981 ]
[AIRFLOW-931] Do not set QUEUED in TaskInstances
The contract of TaskInstances stipulates that end
states for Tasks
can only be UP_FOR_RETRY, SUCCESS, FAILED,
UPSTREAM_FAILED or
SKIPPED. If concurrency was reached task instances
were set to
QUEUED by the task instance themselves. This would
prevent the
scheduler to pick them up again.
We set the state to NONE now, to ensure integrity.
Closes #2127 from bolkedebruin/AIRFLOW-931
> LocalExecutor fails to run queued task with race condition
> ----------------------------------------------------------
>
> Key: AIRFLOW-931
> URL: https://issues.apache.org/jira/browse/AIRFLOW-931
> Project: Apache Airflow
> Issue Type: Sub-task
> Affects Versions: Airflow 1.8, 1.8.0rc4
> Reporter: Vijay Krishna Ramesh
> Assignee: Bolke de Bruin
>
> https://gist.github.com/vijaykramesh/707262c83429ab2a3f5ee701879813e3
> provides a small example that consistently hits this problem with
> LocalExecutor.
> Basically when the dag run kicks off (with concurrency > 1) and a
> LocalExecutor with parallelism > 2 the scheduler marks more than concurrency
> tasks as queued
> (https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L1095)
> There is a second check before actually running the task
> (https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1291)
> that leaves the task in the QUEUED state but then the scheduler never picks
> it back up. This causes the DAG to get stuck (as the queued tasks never run)
> until the scheduler is restarted (at which point the enqueued tasks are
> considered orphaned, the status is set to NONE, and then they are picked up
> by the scheduler again and run.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)