[ 
https://issues.apache.org/jira/browse/AIRFLOW-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15759359#comment-15759359
 ] 

Bolke de Bruin commented on AIRFLOW-695:
----------------------------------------

Ok I think I figure out the issue. The scheduler checks the tasks instances 
without taking into account if the executor already reported back. In this case 
the executor reports back several iterations later. Due to the fact tasks will 
not enter the queue when the task is considered running, the task state will be 
"queued" indefinitely in limbo between the scheduler and the executor.

> Retries do not execute because dagrun is in FAILED state
> --------------------------------------------------------
>
>                 Key: AIRFLOW-695
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-695
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun
>            Reporter: Harvey Xia
>            Priority: Blocker
>              Labels: executor, scheduler
>
> Currently on the latest master commit 
> (15ff540ecd5e60e7ce080177ea3ea227582a4672), running on the LocalExecutor, 
> retries on tasks do not execute because the state of the corresponding dagrun 
> changes to FAILED. The task instance then gets blocked because "Task 
> instance's dagrun was not in the 'running' state but in the state 'failed'," 
> the error message produced by the following lines: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/ti_deps/deps/dagrun_exists_dep.py#L48-L50
> This error can be reproduced with the following simple DAG:
> {code:title=DAG.py|borderStyle=solid}
>         dag = models.DAG(dag_id='test_retry_handling')
>         task = BashOperator(
>             task_id='test_retry_handling_op',
>             bash_command='exit 1',
>             retries=1,
>             retry_delay=datetime.timedelta(minutes=1),
>             dag=dag,
>             owner='airflow',
>             start_date=datetime.datetime(2016, 2, 1, 0, 0, 0))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to