[ 
https://issues.apache.org/jira/browse/AIRFLOW-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300943#comment-16300943
 ] 

George Leslie-Waksman commented on AIRFLOW-790:
-----------------------------------------------

https://github.com/apache/incubator-airflow/pull/2886 is ready.

It is a workaround but it is also a useful safety check that would be useful in 
the absence of the underlying bug. We still want this to handle case where 
someone manually deletes a DagRun without removing its queued and scheduled 
tasks.

I hope to come back and resolve the root cause that creates these task 
instances but the code is rather tricky to follow and this is causing us 
production issues right now. It also seemed better to contribute a partial fix 
than to contribute no fix.

For us, occasionally tasks will run long and overlap two subsequent execution 
dates. These tasks are being run with catchup=False, max_active_runs=1 and 
concurrency=16. When this happens, a number of task instances get scheduled for 
the intervening DagRun but the DagRun itself is skipped (not created) because 
of catchup=False. These still count against concurrency so after this happens a 
few times the dag simply stops scheduling tasks altogether until we go in and 
manually mark all of those Task Instances as failed.

> DagRuns do not exist for certain tasks, but don’t get fixed
> -----------------------------------------------------------
>
>                 Key: AIRFLOW-790
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-790
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Bolke de Bruin
>            Assignee: George Leslie-Waksman
>
> Log gets flooded without a suggestion what to do



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to