uranusjr commented on a change in pull request #17552:
URL: https://github.com/apache/airflow/pull/17552#discussion_r690235892
##########
File path: docs/apache-airflow/dag-run.rst
##########
@@ -54,17 +54,33 @@ Cron Presets
Your DAG will be instantiated for each schedule along with a corresponding
DAG Run entry in the database backend.
-.. note::
+Data Interval
+-------------
- If you run a DAG on a schedule_interval of one day, the run stamped
2020-01-01
- will be triggered soon after 2020-01-01T23:59. In other words, the job
instance is
- started once the period it covers has ended. The ``execution_date``
available in the context
- will also be 2020-01-01.
+Each DAG run in Airflow has an assigned "data interval" that represents the
time
+range it operates in. For a DAG scheduled with ``@daily``, for example, each of
+its data interval would start at midnight of each day and end at midnight of
the
+next day.
- The first DAG Run is created based on the minimum ``start_date`` for the
tasks in your DAG.
- Subsequent DAG Runs are created by the scheduler process, based on your
DAG’s ``schedule_interval``,
- sequentially. If your start_date is 2020-01-01 and schedule_interval is
@daily, the first run
- will be created on 2020-01-02 i.e., after your start date has passed.
+A DAG run is scheduled *after* its associated data interval has ended, to
ensure
+the run is able to collect all the data within the time period. Therefore, a
run
+covering the data period of 2020-01-01 will not start to run until 2020-01-01
+has ended, i.e. after 2020-01-02 00:00:00.
Review comment:
Technically yes; I wrote it like this to avoid going into too much “well
except” stuff this early in the documentation. It’d definitely be best if we
can explain this default “DAG skips one interval” behaviour in another way.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]