[
https://issues.apache.org/jira/browse/AIRFLOW-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293076#comment-16293076
]
George Leslie-Waksman commented on AIRFLOW-790:
-----------------------------------------------
My working hypothesis is that there is a possible inconsistency in Airflow
around how DagRuns are constructed and how TaskInstances are constructed and it
plays somewhat unfavorably with catchup=False
It appears as though DagRun creation, enumerating (dag, task, execution_date)
to execute, and creating TaskInstances are all done in different metadata db
transactions. It looks like it is possible (not 100% certain) to rollback the
DagRun after enumerating the task instances but before creating the
TaskInstances and there is no check that a TaskInstance has a DagRun before
creating the task instance.
This occurs in process_file and its sub-calls:
https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L1709
Additionally, the _change_state_for_tis_without_dagruns method:
https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L944
ignores TaskInstances without a DagRun and only processes Taskinstances with
non-running DagRuns
I have not found the specific part of process_file that is causing the problem
and it may be a red herring so I plan to modify
_change_state_for_tis_without_dagruns to also change the state of TaskInstances
that do not have a DagRun.
> DagRuns do not exist for certain tasks, but don’t get fixed
> -----------------------------------------------------------
>
> Key: AIRFLOW-790
> URL: https://issues.apache.org/jira/browse/AIRFLOW-790
> Project: Apache Airflow
> Issue Type: Bug
> Reporter: Bolke de Bruin
> Assignee: George Leslie-Waksman
>
> Log gets flooded without a suggestion what to do
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)