That does sound like a bug, and I would have expected, as you did, that not
specifying an end_date on some tasks means those tasks should run for ever.
Changes that probably need making is that a task end_date of None on a task
should me "greater" than other task dates in/around the lines you linked to.
Do we need to add a TIDep
to ensure the exec date is less than the task end date?
> On 21 Feb 2018, at 20:58, Chris Palmer <ch...@crpalmer.com> wrote:
> I was very surprised to find that if you set an end_date on any of the
> tasks in a DAG, that the scheduler won't create DagRuns after the minimum
> end_date of tasks. The code that does this is the 6 or so lines starting
> here -
> So if for example I have:
> - a DAG with a start_date of 2018-02-01, no specific end_date and a
> daily schedule
> - One task in that DAG with no specified end_date
> - A second task in that DAG with an end_date of 2018-02-02
> The scheduler will create a DagRuns for 2018-02-01 and 2018-02-02 but will
> not create a DagRun for 2018-02-03 or later.
> That seems completely counter intuitive to me. I would expect the scheduler
> to keep creating DagRuns so that the first task can keep running.
> Interestingly, if I manually created a DagRun for 2018-02-03 then the
> scheduler would then only scheduled the first task for that execution_date
> and actually respects the end_date of the second task.
> The only alternative to adding an end_date to a task is to edit the DAG and
> remove those tasks from the DAG entirely. However, that means the webserver
> is no longer aware of those tasks and I can't look at the historical
> behavior in the UI.
> Does anyone have explanation for why this logic is there? Is there some
> necessary use case for that restriction that I'm not thinking about?
> I could see a similar piece of code that checks to see if all tasks in the
> DAG have specified end_dates and prevents the scheduler from creating
> DagRuns passed the MAX of those dates. There is no point in creating
> DagRuns if none of the tasks are going to be run, but as long as at least
> one task can run for that execution_date I think the scheduler should
> create it.