Bolke, Sid and I had a brief conversation to discuss some of the implications of https://github.com/airbnb/airflow/issues/1427
There are two large points that need to be addressed: 1. this particular issue arises because of an alignment issue between start_date and schedule_interval. This can only happen with cron-based schedule_intervals that describe absolute points in time (like “1am”) as opposed to time deltas (like “every hour”). Ironically, I once reported this same issue myself (#959). In the past (and in the docs) we have simply said that users must make sure the two params agree. We discussed the possibility of a DAG validation method to raise an error if the start_date and schedule_interval don’t align, but Bolke made the point (and I agreed) that in these cases, start_date is sort of like telling the scheduler to “start paying attention” as opposed to “this is my first execution date”. In #1427, the scheduler was being asked to start paying attention on 4/24/16 00:00:00 but not to do anything until 4/24/16 01:10:00. However, it was scheduling a first run at midnight and a second run at 1:10. Regardless of whether we choose to validate/warn/error, Bolke is going to change the scheduling logic so that the cron-based interval takes precedence over a start date. Specifically, the first date on or after the start_date that complies with the schedule_interval becomes the first execution date. 2. Issue #1 led to a second issue: depends_on_past checks for a successful TI at `execution_date - schedule_interval`. This is fragile, since it is very possible for the previous TI to have run at any time in the past, not just one schedule_interval ago. This can happen easily with ad-hoc DAG runs, and also if a DAG was paused for a while. Less commonly, it happens with the situation described in point #1, where the first scheduled run is off-schedule (the midnight run followed by the daily 1:10am runs). The clear fix seems to be to have depends_on_past check the last TI that ran, regardless of whether it ran `schedule_interval` ago. That's in line with the intent of the flag. I will submit a fix. -J
