uranusjr commented on a change in pull request #17552:
URL: https://github.com/apache/airflow/pull/17552#discussion_r707090443
##########
File path: docs/apache-airflow/dag-run.rst
##########
@@ -54,17 +54,33 @@ Cron Presets
Your DAG will be instantiated for each schedule along with a corresponding
DAG Run entry in the database backend.
-.. note::
+Data Interval
+-------------
- If you run a DAG on a schedule_interval of one day, the run stamped
2020-01-01
- will be triggered soon after 2020-01-01T23:59. In other words, the job
instance is
- started once the period it covers has ended. The ``execution_date``
available in the context
- will also be 2020-01-01.
+Each DAG run in Airflow has an assigned "data interval" that represents the
time
+range it operates in. For a DAG scheduled with ``@daily``, for example, each of
+its data interval would start at midnight of each day and end at midnight of
the
+next day.
- The first DAG Run is created based on the minimum ``start_date`` for the
tasks in your DAG.
- Subsequent DAG Runs are created by the scheduler process, based on your
DAG’s ``schedule_interval``,
- sequentially. If your start_date is 2020-01-01 and schedule_interval is
@daily, the first run
- will be created on 2020-01-02 i.e., after your start date has passed.
+A DAG run is scheduled *after* its associated data interval has ended, to
ensure
+the run is able to collect all the data within the time period. Therefore, a
run
+covering the data period of 2020-01-01 will not start to run until 2020-01-01
+has ended, i.e. after 2020-01-02 00:00:00.
Review comment:
I made the wording less certain on this to acount for the custom
timetable edge cases. I don’t think running a DAG before its data interval ends
makes enough general sense to complicate the explaination, so this is done with
some “usually” and “generally”s.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]