Rostislaw Krassow created AIRFLOW-1230:
------------------------------------------
Summary: Upstream_failed tasks are not executed when the DAG is
restarted after failure
Key: AIRFLOW-1230
URL: https://issues.apache.org/jira/browse/AIRFLOW-1230
Project: Apache Airflow
Issue Type: Bug
Components: DAG, DagRun
Affects Versions: 1.8.1, 1.8.0
Environment: CentOS release 6.8 (Final)
Python 2.7.10
Reporter: Rostislaw Krassow
Attachments: DAG_cleared.gif, DAG_failed.gif,
DAG_partly_successful.gif, new_example_bash_operator.py
The issue is reproducible with Airflow 1.8.0 and 1.8.1.
Steps to reproduce:
1. Use the attached DAG
[new_example_bash_operator|^new_example_bash_operator.py]. This is a modified
standard example DAG. The task run_before_loop will fail because it contains an
error.
2. Execute the DAG:
airflow backfill new_example_bash_operator -s 2017-05-02 -e 2017-05-02
The task run_before_loop fails as expected. The DAG fails. The screenshot of
the UI is attached. !DAG_failed.gif!
All dependend tasks like runme_0, runme_1, runme_2 are going to state
"upstream_failed".
3. Fix the BashOperator in the task run_before_loop (just put "echo 1" as
bash_command).
4. Execute the DAG again:
airflow backfill new_example_bash_operator -s 2017-05-02 -e 2017-05-02
Expected behavior:
Restart of the DAG leads to execution of all failed tasks including
upstream_failed tasks.
Observed behavior:
1. The failed task is not restarted.
2. All dependend tasks are not restarted.
3. In order to get the DAG reexecuted its state must be cleared manually:
airflow clear -f -c new_example_bash_operator -s 2017-05-02 -e 2017-05-02
!DAG_cleared.gif!
After the clearance the same DAG can be restarted:
airflow backfill new_example_bash_operator -s 2017-05-02 -e 2017-05-02
Then the task run_before_loop is executed. All other tasks still remain in
state "upstream_failed". !DAG_partly_successful.gif!
To get all tasks executed their state must be cleared explicitely.
Conclusion:
This is a blocker issue for productive usage. We run several dozens of DAGs
with high number of tasks. In the production environment there are always
failed tasks. In such cases the restart of the DAG must be simple possible.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)