[
https://issues.apache.org/jira/browse/AIRFLOW-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300943#comment-16300943
]
George Leslie-Waksman commented on AIRFLOW-790:
-----------------------------------------------
https://github.com/apache/incubator-airflow/pull/2886 is ready.
It is a workaround but it is also a useful safety check that would be useful in
the absence of the underlying bug. We still want this to handle case where
someone manually deletes a DagRun without removing its queued and scheduled
tasks.
I hope to come back and resolve the root cause that creates these task
instances but the code is rather tricky to follow and this is causing us
production issues right now. It also seemed better to contribute a partial fix
than to contribute no fix.
For us, occasionally tasks will run long and overlap two subsequent execution
dates. These tasks are being run with catchup=False, max_active_runs=1 and
concurrency=16. When this happens, a number of task instances get scheduled for
the intervening DagRun but the DagRun itself is skipped (not created) because
of catchup=False. These still count against concurrency so after this happens a
few times the dag simply stops scheduling tasks altogether until we go in and
manually mark all of those Task Instances as failed.
> DagRuns do not exist for certain tasks, but don’t get fixed
> -----------------------------------------------------------
>
> Key: AIRFLOW-790
> URL: https://issues.apache.org/jira/browse/AIRFLOW-790
> Project: Apache Airflow
> Issue Type: Bug
> Reporter: Bolke de Bruin
> Assignee: George Leslie-Waksman
>
> Log gets flooded without a suggestion what to do
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)