I was very surprised to find that if you set an end_date on any of the tasks in a DAG, that the scheduler won't create DagRuns after the minimum end_date of tasks. The code that does this is the 6 or so lines starting here - https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L867 .
So if for example I have: - a DAG with a start_date of 2018-02-01, no specific end_date and a daily schedule - One task in that DAG with no specified end_date - A second task in that DAG with an end_date of 2018-02-02 The scheduler will create a DagRuns for 2018-02-01 and 2018-02-02 but will not create a DagRun for 2018-02-03 or later. That seems completely counter intuitive to me. I would expect the scheduler to keep creating DagRuns so that the first task can keep running. Interestingly, if I manually created a DagRun for 2018-02-03 then the scheduler would then only scheduled the first task for that execution_date and actually respects the end_date of the second task. The only alternative to adding an end_date to a task is to edit the DAG and remove those tasks from the DAG entirely. However, that means the webserver is no longer aware of those tasks and I can't look at the historical behavior in the UI. Does anyone have explanation for why this logic is there? Is there some necessary use case for that restriction that I'm not thinking about? I could see a similar piece of code that checks to see if all tasks in the DAG have specified end_dates and prevents the scheduler from creating DagRuns passed the MAX of those dates. There is no point in creating DagRuns if none of the tasks are going to be run, but as long as at least one task can run for that execution_date I think the scheduler should create it. Thanks Chris
