[ 
https://issues.apache.org/jira/browse/AIRFLOW-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15759310#comment-15759310
 ] 

Bolke de Bruin commented on AIRFLOW-695:
----------------------------------------

I did some digging. I cannot replicate the behavior with the 
SequentialExecutor, but I can with the LocalExecutor (I didn't try with 
Celery). It seems that the tasks is still part of "self.running" when it is 
re-queued. In this state it will not be run again.

The executors have not been updated recently so the issue must be in the 
calling functions. I haven't figure that out yet. The state change of the task 
should be caught by the "heartbeat" method calling the "sync" method of the 
executor and then it should be removed from "self.running". It seems it isn't.

[~aoen] [~pauly] Maybe you guys have a clue.

> Retries do not execute because dagrun is in FAILED state
> --------------------------------------------------------
>
>                 Key: AIRFLOW-695
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-695
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun
>            Reporter: Harvey Xia
>            Priority: Blocker
>              Labels: executor, scheduler
>
> Currently on the latest master commit 
> (15ff540ecd5e60e7ce080177ea3ea227582a4672), running on the LocalExecutor, 
> retries on tasks do not execute because the state of the corresponding dagrun 
> changes to FAILED. The task instance then gets blocked because "Task 
> instance's dagrun was not in the 'running' state but in the state 'failed'," 
> the error message produced by the following lines: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/ti_deps/deps/dagrun_exists_dep.py#L48-L50
> This error can be reproduced with the following simple DAG:
> {code:title=DAG.py|borderStyle=solid}
>         dag = models.DAG(dag_id='test_retry_handling')
>         task = BashOperator(
>             task_id='test_retry_handling_op',
>             bash_command='exit 1',
>             retries=1,
>             retry_delay=datetime.timedelta(minutes=1),
>             dag=dag,
>             owner='airflow',
>             start_date=datetime.datetime(2016, 2, 1, 0, 0, 0))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to