mai-nakagawa commented on PR #23662:
URL: https://github.com/apache/airflow/pull/23662#issuecomment-1136719955

   Thanks for your comment, @uranusjr.
   
   > Instead of inheriting `CronDataIntervalTimetable` directly, I would prefer 
the common logic be extracted out to a superclass, which both timetable clases 
inherit.
   
   Agree.
   
   > I feel this class should also schedule tasks differently. As described in 
AIP-39, a common use case for this kind of timetables is for tasks to run at 
the end of each weekday (say 5pm). This means the cron expression should 
express the _end_ of a data interval, instead of the beginning like 
`CronDataIntervalTimetable`. In other words, `PlainCronTimetable("0 5 * * 
1-5")` triggers a DAG run at 5pm on each Monday through Friday, with each run 
covering the interval between the triggering time of this and the previous run. 
This would mean we need more different logic between this class and 
`CronDataIntervalTimetable`.
   
   I don't fully understand. In my understanding, even with the current 
`CronDataIntervalTimetable`, we express the _end_ (not the _beginning_) of a 
data interval. For example, `CronDataIntervalTimetable("0 5 * * MON-FRI")` 
triggers a DAG run at 5am on each Monday through Friday. Kindly correct me if 
I'm wrong.
   
   However, I agree `CronTimetable` (instead of `CronDataIntervalTimetable`), 
which ignores the idea of "data interval", would be appreciated. It means 
`execution_date`, `data_interval_start` and `data_interval_end` will be the 
same - what time the DAG run starts. It will show what time the next DAG Run 
will be triggered exactly in the Web UI:
   
   
![next_run](https://user-images.githubusercontent.com/2883424/170180733-c8849c64-9032-48a0-a04b-095443924dc6.png)
   
   > Honestly the current behaviour sounds like a bug to me, and arguably we 
might want to just fix that instead of introducing a new timetable. Most users 
probably don’t really care about that first run, and we could introduce a 
custom timetable to support the legacy first run behaviour. (The main 
difference is, I think we should change the `schedule_interval` behaviour.)
   
   I personally okay with your idea though I'm not too sure if most users don't 
really care about the first run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to