Joe Schmid created AIRFLOW-1011:
-----------------------------------
Summary: Task Instance Results not stored for SubDAG Tasks
Key: AIRFLOW-1011
URL: https://issues.apache.org/jira/browse/AIRFLOW-1011
Project: Apache Airflow
Issue Type: Bug
Components: backfill, subdag
Affects Versions: Airflow 1.8
Reporter: Joe Schmid
Priority: Critical
Attachments: 1-TopLevelDAGTaskInstancesShownCorrectly.png,
2-ZoomedSubDAG-NoTaskInstances-v1.8.png,
3-ZoomedSubDAG-TaskInstances-v1.7.1.3.png
In previous Airflow versions, results for tasks executed as a subdag were
written as rows to task_instances. In Airflow 1.8 only rows for tasks inside
the top-level DAG (non-subdag tasks) seem to get written to the database.
This results in being unable to check the status of task instances inside the
subdag from the UI, check the logs for those task instances from the UI, etc.
Here is a simple test DAG that exhibits the issue:
------------------------------------------------------------------------
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.models import DAG
from datetime import datetime, timedelta
args = {
'owner': 'airflow',
'start_date': datetime(2016, 3, 1),
}
DAG_NAME = 'Test_SubDAG'
SUBDAG_OP = 'SubDagOp'
def get_test_subdag():
subdag = DAG(
dag_id='{}.{}'.format(DAG_NAME, SUBDAG_OP), default_args=args,
schedule_interval="@daily") # This is ignored, but it can't be None or
@once
first = DummyOperator(
task_id='SubDAG_Task1',
dag=subdag
)
last = DummyOperator(
task_id='SubDAG_Task2',
dag=subdag
)
first >> last
return subdag
dag = DAG(
dag_id=DAG_NAME, default_args=args,
schedule_interval=None,
dagrun_timeout=timedelta(hours=1))
run_first = DummyOperator(
task_id='DAG_Task1',
dag=dag
)
run_subdag = SubDagOperator(
subdag=get_test_subdag(),
task_id=SUBDAG_OP,
dag=dag
)
run_last = DummyOperator(
task_id='DAG_Task2',
dag=dag
)
run_first >> run_subdag
run_subdag >> run_last
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)