collinmcnulty commented on code in PR #38505: URL: https://github.com/apache/airflow/pull/38505#discussion_r1540191934
########## docs/apache-airflow/authoring-and-scheduling/timetable.rst: ########## @@ -19,49 +19,49 @@ Timetables ========== -For DAGs with time-based schedules (as opposed to event-driven), the scheduling -decisions are driven by its internal "timetable". The timetable also -determines the data interval and the logical date of each run created for the DAG. +For DAGs with time-based schedules (as opposed to event-driven), the DAGs internal "timetable" +drives scheduling. The timetable also determines the data interval and the logical date of +each run created for the DAG. DAGs scheduled with a cron expression or ``timedelta`` object are internally converted to always use a timetable. If a cron expression or ``timedelta`` is sufficient for your use case, you don't need to worry about writing a custom timetable because Airflow has default timetables that handle those cases. But for more complicated scheduling requirements, -you may create your own timetable class and pass that to the DAG's ``schedule`` argument. +you can create your own timetable class and pass that to the DAG's ``schedule`` argument. -Here are some examples of when custom timetable implementations are useful: +Some examples of when custom timetable implementations are useful: -* Data intervals with "holes" between. (Instead of continuous, as both the cron - expression and ``timedelta`` schedules represent.) -* Run tasks at different times each day. For example, an astronomer may find it +* Data intervals with "holes" between intervals instead of a continuous interval, as both the cron Review Comment: I think introducing the interval here might be confusing. ########## docs/apache-airflow/authoring-and-scheduling/timetable.rst: ########## @@ -19,49 +19,49 @@ Timetables ========== -For DAGs with time-based schedules (as opposed to event-driven), the scheduling -decisions are driven by its internal "timetable". The timetable also -determines the data interval and the logical date of each run created for the DAG. +For DAGs with time-based schedules (as opposed to event-driven), the DAGs internal "timetable" +drives scheduling. The timetable also determines the data interval and the logical date of +each run created for the DAG. DAGs scheduled with a cron expression or ``timedelta`` object are internally converted to always use a timetable. If a cron expression or ``timedelta`` is sufficient for your use case, you don't need to worry about writing a custom timetable because Airflow has default timetables that handle those cases. But for more complicated scheduling requirements, -you may create your own timetable class and pass that to the DAG's ``schedule`` argument. +you can create your own timetable class and pass that to the DAG's ``schedule`` argument. -Here are some examples of when custom timetable implementations are useful: +Some examples of when custom timetable implementations are useful: -* Data intervals with "holes" between. (Instead of continuous, as both the cron - expression and ``timedelta`` schedules represent.) -* Run tasks at different times each day. For example, an astronomer may find it +* Data intervals with "holes" between intervals instead of a continuous interval, as both the cron Review Comment: maybe link to an explanation of data_interval conceptually? If such a doc exists ########## docs/apache-airflow/authoring-and-scheduling/timetable.rst: ########## @@ -216,46 +215,45 @@ Timetables comparisons Differences between the two cron timetables ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -There are two timetables `CronTriggerTimetable`_ and `CronDataIntervalTimetable`_ that accepts a cron expression. -There are some differences between the two: -- `CronTriggerTimetable`_ does not take care of *Data Interval*, while `CronDataIntervalTimetable`_ does. -- The time when a DAG run is triggered by `CronTriggerTimetable`_ is more intuitive and more similar to what people -expect cron to behave than that of `CronDataIntervalTimetable`_ (when ``catchup`` is ``False``). +Airflow has two timetables `CronTriggerTimetable`_ and `CronDataIntervalTimetable`_ that accept a cron expression. +However, there are differences between the two: +- `CronTriggerTimetable`_ does not address *Data Interval*, while `CronDataIntervalTimetable`_ does. +- The time when a DAG run is triggered by `CronTriggerTimetable`_ is more similar to how people +expect cron to behave compared to `CronDataIntervalTimetable`_ when ``catchup`` is ``False``. Whether taking care of *Data Interval* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -`CronTriggerTimetable`_ *does not* care the idea of *data interval*. It means the value of ``data_interval_start``, -``data_interval_end`` and legacy ``execution_date`` are the same - the time when a DAG run is triggered. +`CronTriggerTimetable`_ *does not* include *data interval*. This means that the value of ``data_interval_start`` and +``data_interval_end`` (and the legacy ``execution_date``) are the same; the time when a DAG run is triggered. -On the other hand, `CronDataIntervalTimetable`_ *does* care the idea of *data interval*. It means the value of -``data_interval_start`` and ``data_interval_end`` (and legacy ``execution_date``) are different. They are the start -and end of the interval respectively. +However, `CronDataIntervalTimetable`_ *does* include *data interval*. This means the value of +``data_interval_start`` and ``data_interval_end`` (and legacy ``execution_date``) are different. ``data_interval_start`` is the time when a +DAG run is triggered and ``data_interval_end`` is the end of the interval. The time when a DAG run is triggered ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -There is no difference between the two when ``catchup`` is ``True``. :ref:`dag-catchup` tells you how DAG runs are +There is no difference between `CronTriggerTimetable`_ and `CronDataIntervalTimetable`_ when ``catchup`` is ``True``. :ref:`dag-catchup` tells you how DAG runs are triggered when ``catchup`` is ``True``. -When ``catchup`` is ``False``, there is difference in how a new DAG run is triggered. `CronTriggerTimetable`_ triggers -a new DAG run *after* the current time, while `CronDataIntervalTimetable`_ does *before* the current time (assuming -the value of ``start_date`` is past time). +When ``catchup`` is ``False``, there is difference in how a new DAG run is triggered. -Here is an example showing how the first DAG run is triggered. Supposes there is a cron expression ``@daily`` or -``0 0 * * *``, which is aimed to run at 12AM every day. If you enable DAGs using the two timetables at 3PM on January -31st, `CronTriggerTimetable`_ will trigger a new DAG run at 12AM on February 1st. `CronDataIntervalTimetable`_, on the other -hand, will immediately trigger a new DAG run which is supposed to trigger at 12AM on January 31st if the DAG had been -enabled beforehand. +`CronTriggerTimetable`_ triggers a new DAG run *after* the current time, while `CronDataIntervalTimetable`_ triggers a DAG run *before* the current time, assuming Review Comment: This feels confusing. Both timetables will kick off runs at the same time, given the same cron string, but the _run_id_ for those runs will be as you describe here. I think we should explicitly call out that the `CronDataIntervalTimetable`'s run_id timestamp does NOT match the time it actually started running. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
