[ 
https://issues.apache.org/jira/browse/AIRFLOW-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293076#comment-16293076
 ] 

George Leslie-Waksman commented on AIRFLOW-790:
-----------------------------------------------

My working hypothesis is that there is a possible inconsistency in Airflow 
around how DagRuns are constructed and how TaskInstances are constructed and it 
plays somewhat unfavorably with catchup=False

It appears as though DagRun creation, enumerating (dag, task, execution_date) 
to execute, and creating TaskInstances are all done in different metadata db 
transactions. It looks like it is possible (not 100% certain) to rollback the 
DagRun after enumerating the task instances but before creating the 
TaskInstances and there is no check that a TaskInstance has a DagRun before 
creating the task instance.

This occurs in process_file and its sub-calls: 
https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L1709

Additionally, the _change_state_for_tis_without_dagruns method: 
https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L944 
ignores TaskInstances without a DagRun and only processes Taskinstances with 
non-running DagRuns

I have not found the specific part of process_file that is causing the problem 
and it may be a red herring so I plan to modify 
_change_state_for_tis_without_dagruns to also change the state of TaskInstances 
that do not have a DagRun.

> DagRuns do not exist for certain tasks, but don’t get fixed
> -----------------------------------------------------------
>
>                 Key: AIRFLOW-790
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-790
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Bolke de Bruin
>            Assignee: George Leslie-Waksman
>
> Log gets flooded without a suggestion what to do



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to