ashb commented on a change in pull request #17552:
URL: https://github.com/apache/airflow/pull/17552#discussion_r709108678



##########
File path: docs/apache-airflow/faq.rst
##########
@@ -295,7 +311,7 @@ commonly attempted in ``user_defined_macros``.
 
         bo = BashOperator(task_id="my_task", bash_command="echo {{ 
my_custom_macro }}", dag=dag)
 
-This will echo "day={{ ds }}" instead of "day=2020-01-01" for a dagrun with 
the execution date 2020-01-01 00:00:00.
+This will echo "day={{ ds }}" instead of "day=2020-01-01" for a dagrun with 
``data_interval_start`` 2020-01-01 00:00:00.

Review comment:
       ```suggestion
   This will echo "day={{ ds }}" instead of "day=2020-01-01" for a dagrun with 
a ``data_interval_start`` of 2020-01-01 00:00:00.
   ```

##########
File path: docs/apache-airflow/concepts/dags.rst
##########
@@ -148,11 +148,20 @@ The ``schedule_interval`` argument takes any value that 
is a valid `Crontab <htt
     with DAG("my_daily_dag", schedule_interval="0 * * * *"):
         ...
 
-Every time you run a DAG, you are creating a new instance of that DAG which 
Airflow calls a :doc:`DAG Run </dag-run>`. DAG Runs can run in parallel for the 
same DAG, and each has a defined ``execution_date``, which identifies the 
*logical* date and time it is running for - not the *actual* time when it was 
started.
+.. tip::
+
+    For more information on ``schedule_interval`` values, see :doc:`DAG Run 
</dag-run>`.
+
+    If ``schedule_interval`` is not enough to express the DAG's schedule, see 
:doc:`Timetables </howto/timetable>`.
+
+Every time you run a DAG, you are creating a new instance of that DAG which 
Airflow calls a :doc:`DAG Run </dag-run>`. DAG Runs can run in parallel for the 
same DAG, and each has a defined data interval, which identifies the *logical* 
date and time range it is running for - not the *actual* time when it was 
started.

Review comment:
       ```suggestion
   Every time you run a DAG, you are creating a new instance of that DAG which 
Airflow calls a :doc:`DAG Run </dag-run>`. DAG Runs can run in parallel for the 
same DAG, and each has a defined data interval, which identifies the period of 
data the tasks should operate on.
   ```

##########
File path: docs/apache-airflow/concepts/dags.rst
##########
@@ -148,11 +148,20 @@ The ``schedule_interval`` argument takes any value that 
is a valid `Crontab <htt
     with DAG("my_daily_dag", schedule_interval="0 * * * *"):
         ...
 
-Every time you run a DAG, you are creating a new instance of that DAG which 
Airflow calls a :doc:`DAG Run </dag-run>`. DAG Runs can run in parallel for the 
same DAG, and each has a defined ``execution_date``, which identifies the 
*logical* date and time it is running for - not the *actual* time when it was 
started.
+.. tip::
+
+    For more information on ``schedule_interval`` values, see :doc:`DAG Run 
</dag-run>`.
+
+    If ``schedule_interval`` is not enough to express the DAG's schedule, see 
:doc:`Timetables </howto/timetable>`.
+
+Every time you run a DAG, you are creating a new instance of that DAG which 
Airflow calls a :doc:`DAG Run </dag-run>`. DAG Runs can run in parallel for the 
same DAG, and each has a defined data interval, which identifies the *logical* 
date and time range it is running for - not the *actual* time when it was 
started.

Review comment:
       ```suggestion
   Every time you run a DAG, you are creating a new instance of that DAG which 
Airflow calls a :doc:`DAG Run </dag-run>`. DAG Runs can run in parallel for the 
same DAG, and each has a defined data interval, which identifies the period of 
data the tasks should operate on.
   ```

##########
File path: docs/apache-airflow/concepts/operators.rst
##########
@@ -66,20 +66,20 @@ Jinja Templating
 ----------------
 Airflow leverages the power of `Jinja Templating 
<http://jinja.pocoo.org/docs/dev/>`_ and this can be a powerful tool to use in 
combination with :ref:`macros <templates-ref>`.
 
-For example, say you want to pass the execution date as an environment 
variable to a Bash script using the ``BashOperator``:
+For example, say you want to pass the start of the data interval as an 
environment variable to a Bash script using the ``BashOperator``:
 
 .. code-block:: python
 
-  # The execution date as YYYY-MM-DD
+  # The start of the data interval as YYYY-MM-DD
   date = "{{ ds }}"

Review comment:
       Should we use something other than `ds` here now?

##########
File path: docs/apache-airflow/dag-run.rst
##########
@@ -54,17 +54,36 @@ Cron Presets
 Your DAG will be instantiated for each schedule along with a corresponding
 DAG Run entry in the database backend.
 
-.. note::
 
-    If you run a DAG on a schedule_interval of one day, the run stamped 
2020-01-01
-    will be triggered soon after 2020-01-01T23:59. In other words, the job 
instance is
-    started once the period it covers has ended.  The ``execution_date`` 
available in the context
-    will also be 2020-01-01.
+.. _data-interval:
 
-    The first DAG Run is created based on the minimum ``start_date`` for the 
tasks in your DAG.
-    Subsequent DAG Runs are created by the scheduler process, based on your 
DAG’s ``schedule_interval``,
-    sequentially. If your start_date is 2020-01-01 and schedule_interval is 
@daily, the first run
-    will be created on 2020-01-02 i.e., after your start date has passed.
+Data Interval
+-------------
+
+Each DAG run in Airflow has an assigned "data interval" that represents the 
time
+range it operates in. For a DAG scheduled with ``@daily``, for example, each of
+its data interval would start at midnight of each day and end at midnight of 
the
+next day.
+
+A DAG run is usually scheduled *after* its associated data interval has ended,
+to ensure the run is able to collect all the data within the time period. In
+other words, a run covering the data period of 2020-01-01 generally does not
+start to run until 2020-01-01 has ended, i.e. after 2020-01-02 00:00:00.
+
+All dates in Airflow are tied to the data interval concept in some way. The
+"logical date" (also called ``execution_date`` in Airflow versions prior to 
2.2)
+of a DAG run, for example, denotes the start of the data interval, not when the

Review comment:
       I didn't think logical date and stat date were strictly tied to each 
other -- couldn't a custom timetable choose to do something else here? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to