jedcunningham commented on a change in pull request #17552:
URL: https://github.com/apache/airflow/pull/17552#discussion_r687016473
##########
File path: docs/apache-airflow/dag-run.rst
##########
@@ -54,17 +54,31 @@ Cron Presets
Your DAG will be instantiated for each schedule along with a corresponding
DAG Run entry in the database backend.
-.. note::
+Data Interval
+-------------
+
+Each DAG run in Airflow has an assigned "data interval" that represents the
time
+range it operates in. For a DAG scheduled with ``@daily``, for example, each of
+its data interval would start at midnight of each day, and end at midnight of
Review comment:
```suggestion
its data interval would start at midnight of each day and end at midnight of
```
##########
File path: docs/apache-airflow/dag-run.rst
##########
@@ -54,17 +54,31 @@ Cron Presets
Your DAG will be instantiated for each schedule along with a corresponding
DAG Run entry in the database backend.
-.. note::
+Data Interval
+-------------
+
+Each DAG run in Airflow has an assigned "data interval" that represents the
time
+range it operates in. For a DAG scheduled with ``@daily``, for example, each of
+its data interval would start at midnight of each day, and end at midnight of
+the next day.
+
+A DAG run happens *after* its associated data interval has ended, to ensure the
Review comment:
```suggestion
A DAG run is created *after* its associated data interval has ended, to
ensure the
```
"happens" is fine, otherwise it should be "is created". The latter might be
a better choice, but really either work.
##########
File path: docs/apache-airflow/dag-run.rst
##########
@@ -54,17 +54,31 @@ Cron Presets
Your DAG will be instantiated for each schedule along with a corresponding
DAG Run entry in the database backend.
-.. note::
+Data Interval
+-------------
+
+Each DAG run in Airflow has an assigned "data interval" that represents the
time
+range it operates in. For a DAG scheduled with ``@daily``, for example, each of
+its data interval would start at midnight of each day, and end at midnight of
+the next day.
+
+A DAG run happens *after* its associated data interval has ended, to ensure the
+run is able to collect all the actual data within the time period. Therefore, a
Review comment:
```suggestion
run is able to collect all the data within the time period. Therefore, a
```
##########
File path: docs/apache-airflow/howto/timetable.rst
##########
@@ -0,0 +1,63 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ .. http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+
+Customizing DAG Scheduling with Timetables
+==========================================
+
+A DAG's scheduling strategy is determined by its internal "timetable". This
+timetable can be created by specifying the DAG's ``schedule_interval``
argument,
+as described in :doc:`DAG Run </dag-run>`. The timetable also dictates the data
+interval and the logical time of each run created for the DAG.
+
+However, there are situations when a cron expression or simple ``timedelta``
+periods cannot properly express the schedule. Some of the examples are:
+
+* Data intervals with "holes" between. (Instead of continous, as both the cron
+ expression and ``timedelta`` schedules represent.)
+* Run tasks on different times each day. For example, an astronomer may find it
+ useful to run a task on each sunset, to process data collected from the
+ previous sunlight period.
+* Schedules not following the Gregorian calendar. For example, create a run for
+ each month in the `Traditional Chinese Calendar`_. This is conceptually
+ similar to the sunset case above, but for a different time scale.
+* Rolling windows, or overlapping data intervals. For example, one may want to
+ have a run each day, but make each run cover the period of the previous seven
+ days. It is possible to "hack" this with a cron expression, but a custom data
+ interval would task specification more natural.
+
+.. _`Traditional Chinese Calendar`:
https://en.wikipedia.org/wiki/Chinese_calendar
+
+
+For our example, let's say a company may want to run a job after each weekday,
+to process data collected during the work day. The first intuitively answer
Review comment:
```suggestion
For our example, let's say a company wants to run a job after each weekday
to process data collected during the work day. The first intuitive answer
```
##########
File path: docs/apache-airflow/howto/timetable.rst
##########
@@ -0,0 +1,63 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ .. http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+
+Customizing DAG Scheduling with Timetables
+==========================================
+
+A DAG's scheduling strategy is determined by its internal "timetable". This
+timetable can be created by specifying the DAG's ``schedule_interval``
argument,
+as described in :doc:`DAG Run </dag-run>`. The timetable also dictates the data
+interval and the logical time of each run created for the DAG.
+
+However, there are situations when a cron expression or simple ``timedelta``
+periods cannot properly express the schedule. Some of the examples are:
+
+* Data intervals with "holes" between. (Instead of continous, as both the cron
+ expression and ``timedelta`` schedules represent.)
+* Run tasks on different times each day. For example, an astronomer may find it
+ useful to run a task on each sunset, to process data collected from the
+ previous sunlight period.
+* Schedules not following the Gregorian calendar. For example, create a run for
+ each month in the `Traditional Chinese Calendar`_. This is conceptually
+ similar to the sunset case above, but for a different time scale.
+* Rolling windows, or overlapping data intervals. For example, one may want to
+ have a run each day, but make each run cover the period of the previous seven
+ days. It is possible to "hack" this with a cron expression, but a custom data
+ interval would task specification more natural.
+
+.. _`Traditional Chinese Calendar`:
https://en.wikipedia.org/wiki/Chinese_calendar
+
+
+For our example, let's say a company may want to run a job after each weekday,
+to process data collected during the work day. The first intuitively answer
+to this would be ``schedule_interval="0 0 * * 1-5"`` (midnight on Monday to
+Friday), but this means data collected on Friday will *not* be processed right
+after Friday, but on the next Monday, and that run's interval would be from
+midnight Friday to midnight *Monday*.
+
+This is, therefore, a case of the "holes" category; the intended schedule
should
+leave the two weekend days. What we want is:
Review comment:
```suggestion
This is, therefore, an example in the "holes" category above; the intended
schedule should
not include the two weekend days. What we want is:
```
##########
File path: docs/apache-airflow/concepts/dags.rst
##########
@@ -148,14 +148,24 @@ The ``schedule_interval`` argument takes any value that
is a valid `Crontab <htt
with DAG("my_daily_dag", schedule_interval="0 * * * *"):
...
-Every time you run a DAG, you are creating a new instance of that DAG which
Airflow calls a :doc:`DAG Run </dag-run>`. DAG Runs can run in parallel for the
same DAG, and each has a defined ``execution_date``, which identifies the
*logical* date and time it is running for - not the *actual* time when it was
started.
+.. tip::
+
+ For more information on ``schedule_interval`` values, see :doc:`DAG Run
</dag-run>`.
+
+ If ``schedule_interval`` is not enough to express the DAG's schedule, see
:doc:`Timetables </howto/timetable>`.
+
+Every time you run a DAG, you are creating a new instance of that DAG which
Airflow calls a :doc:`DAG Run </dag-run>`. DAG Runs can run in parallel for the
same DAG, and each has a defined data interval, which identifies the *logical*
date and time range it is running for - not the *actual* time when it was
started.
As an example of why this is useful, consider writing a DAG that processes a
daily set of experimental data. It's been rewritten, and you want to run it on
the previous 3 months of data - no problem, since Airflow can *backfill* the
DAG and run copies of it for every day in those previous 3 months, all at once.
-Those DAG Runs will all have been started on the same actual day, but their
``execution_date`` values will cover those last 3 months, and that's what all
the tasks, operators and sensors inside the DAG look at when they run.
+Those DAG Runs will all have been started on the same actual day, but their
data intervals will cover those last 3 months, and that's what all the tasks,
operators and sensors inside the DAG look at when they run.
Review comment:
```suggestion
Those DAG Runs will all have been started on the same actual day, but each
DAG run will have a data interval covering a single day in that 3 month period,
and that data interval is all the tasks, operators and sensors inside the DAG
look at when they run.
```
I think this is more clear, but I'm not hellbent on this one.
##########
File path: docs/apache-airflow/dag-run.rst
##########
@@ -54,17 +54,31 @@ Cron Presets
Your DAG will be instantiated for each schedule along with a corresponding
DAG Run entry in the database backend.
-.. note::
+Data Interval
+-------------
+
+Each DAG run in Airflow has an assigned "data interval" that represents the
time
+range it operates in. For a DAG scheduled with ``@daily``, for example, each of
+its data interval would start at midnight of each day, and end at midnight of
+the next day.
+
+A DAG run happens *after* its associated data interval has ended, to ensure the
+run is able to collect all the actual data within the time period. Therefore, a
+run covering the data period of 2020-01-01 will not start to run until
+2020-01-01 has ended, i.e. 2020-01-02 onwards.
+
+All dates in Airflow are tied to the data interval concept in some way. The
+"logical date" (also called ``execution_date`` from previous Airflow version)
+of a DAG run, for example, usually denotes the start of the data interval, not
+when the DAG is actually executed. Similarly, since the ``start_date`` argument
+for the DAG and its tasks points to the same logical date, a run will only
+be created after that data interval ends. So a DAG with ``@daily`` schedule and
+``start_date`` of 2020-01-01, for example, will not be created until
2020-01-02.
Review comment:
```suggestion
be created after that data interval ends. So a DAG with a ``@daily``
schedule and
a ``start_date`` of 2020-01-01, for example, will not be created until
2020-01-02.
```
##########
File path: docs/apache-airflow/howto/timetable.rst
##########
@@ -0,0 +1,63 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ .. http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+
+Customizing DAG Scheduling with Timetables
+==========================================
+
+A DAG's scheduling strategy is determined by its internal "timetable". This
+timetable can be created by specifying the DAG's ``schedule_interval``
argument,
+as described in :doc:`DAG Run </dag-run>`. The timetable also dictates the data
+interval and the logical time of each run created for the DAG.
+
+However, there are situations when a cron expression or simple ``timedelta``
+periods cannot properly express the schedule. Some of the examples are:
+
+* Data intervals with "holes" between. (Instead of continous, as both the cron
+ expression and ``timedelta`` schedules represent.)
+* Run tasks on different times each day. For example, an astronomer may find it
+ useful to run a task on each sunset, to process data collected from the
+ previous sunlight period.
+* Schedules not following the Gregorian calendar. For example, create a run for
+ each month in the `Traditional Chinese Calendar`_. This is conceptually
+ similar to the sunset case above, but for a different time scale.
+* Rolling windows, or overlapping data intervals. For example, one may want to
+ have a run each day, but make each run cover the period of the previous seven
+ days. It is possible to "hack" this with a cron expression, but a custom data
+ interval would task specification more natural.
+
+.. _`Traditional Chinese Calendar`:
https://en.wikipedia.org/wiki/Chinese_calendar
+
+
+For our example, let's say a company may want to run a job after each weekday,
+to process data collected during the work day. The first intuitively answer
+to this would be ``schedule_interval="0 0 * * 1-5"`` (midnight on Monday to
+Friday), but this means data collected on Friday will *not* be processed right
+after Friday, but on the next Monday, and that run's interval would be from
+midnight Friday to midnight *Monday*.
+
+This is, therefore, a case of the "holes" category; the intended schedule
should
+leave the two weekend days. What we want is:
+
+* Schedule a run for each Monday, Tuesday, Wednesday, Thursday, and Friday. The
+ run's data interval would cover from the midnight of each day, to the
midnight
Review comment:
```suggestion
run's data interval would cover from midnight of each day, to midnight
```
##########
File path: docs/apache-airflow/howto/timetable.rst
##########
@@ -0,0 +1,63 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ .. http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+
+Customizing DAG Scheduling with Timetables
+==========================================
+
+A DAG's scheduling strategy is determined by its internal "timetable". This
+timetable can be created by specifying the DAG's ``schedule_interval``
argument,
+as described in :doc:`DAG Run </dag-run>`. The timetable also dictates the data
+interval and the logical time of each run created for the DAG.
+
+However, there are situations when a cron expression or simple ``timedelta``
+periods cannot properly express the schedule. Some of the examples are:
+
+* Data intervals with "holes" between. (Instead of continous, as both the cron
+ expression and ``timedelta`` schedules represent.)
+* Run tasks on different times each day. For example, an astronomer may find it
+ useful to run a task on each sunset, to process data collected from the
+ previous sunlight period.
+* Schedules not following the Gregorian calendar. For example, create a run for
+ each month in the `Traditional Chinese Calendar`_. This is conceptually
+ similar to the sunset case above, but for a different time scale.
+* Rolling windows, or overlapping data intervals. For example, one may want to
+ have a run each day, but make each run cover the period of the previous seven
+ days. It is possible to "hack" this with a cron expression, but a custom data
+ interval would task specification more natural.
Review comment:
```suggestion
interval would be a more natural representation.
```
Or
```suggestion
interval would make the task specification more natural.
```
##########
File path: docs/apache-airflow/dag-run.rst
##########
@@ -54,17 +54,31 @@ Cron Presets
Your DAG will be instantiated for each schedule along with a corresponding
DAG Run entry in the database backend.
-.. note::
+Data Interval
+-------------
+
+Each DAG run in Airflow has an assigned "data interval" that represents the
time
+range it operates in. For a DAG scheduled with ``@daily``, for example, each of
+its data interval would start at midnight of each day, and end at midnight of
+the next day.
+
+A DAG run happens *after* its associated data interval has ended, to ensure the
+run is able to collect all the actual data within the time period. Therefore, a
+run covering the data period of 2020-01-01 will not start to run until
+2020-01-01 has ended, i.e. 2020-01-02 onwards.
+
+All dates in Airflow are tied to the data interval concept in some way. The
+"logical date" (also called ``execution_date`` from previous Airflow version)
Review comment:
```suggestion
"logical date" (also called ``execution_date`` in Airflow versions before
2.2)
```
##########
File path: docs/apache-airflow/howto/timetable.rst
##########
@@ -0,0 +1,63 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ .. http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+
+Customizing DAG Scheduling with Timetables
+==========================================
+
+A DAG's scheduling strategy is determined by its internal "timetable". This
+timetable can be created by specifying the DAG's ``schedule_interval``
argument,
+as described in :doc:`DAG Run </dag-run>`. The timetable also dictates the data
+interval and the logical time of each run created for the DAG.
+
+However, there are situations when a cron expression or simple ``timedelta``
+periods cannot properly express the schedule. Some of the examples are:
+
+* Data intervals with "holes" between. (Instead of continous, as both the cron
+ expression and ``timedelta`` schedules represent.)
+* Run tasks on different times each day. For example, an astronomer may find it
+ useful to run a task on each sunset, to process data collected from the
+ previous sunlight period.
+* Schedules not following the Gregorian calendar. For example, create a run for
+ each month in the `Traditional Chinese Calendar`_. This is conceptually
+ similar to the sunset case above, but for a different time scale.
+* Rolling windows, or overlapping data intervals. For example, one may want to
+ have a run each day, but make each run cover the period of the previous seven
+ days. It is possible to "hack" this with a cron expression, but a custom data
+ interval would task specification more natural.
+
+.. _`Traditional Chinese Calendar`:
https://en.wikipedia.org/wiki/Chinese_calendar
+
+
+For our example, let's say a company may want to run a job after each weekday,
+to process data collected during the work day. The first intuitively answer
+to this would be ``schedule_interval="0 0 * * 1-5"`` (midnight on Monday to
+Friday), but this means data collected on Friday will *not* be processed right
+after Friday, but on the next Monday, and that run's interval would be from
Review comment:
```suggestion
after Friday ends, but on the next Monday, and that run's interval would be
from
```
##########
File path: docs/apache-airflow/howto/timetable.rst
##########
@@ -0,0 +1,63 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ .. http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+
+Customizing DAG Scheduling with Timetables
+==========================================
+
+A DAG's scheduling strategy is determined by its internal "timetable". This
+timetable can be created by specifying the DAG's ``schedule_interval``
argument,
+as described in :doc:`DAG Run </dag-run>`. The timetable also dictates the data
+interval and the logical time of each run created for the DAG.
+
+However, there are situations when a cron expression or simple ``timedelta``
+periods cannot properly express the schedule. Some of the examples are:
+
+* Data intervals with "holes" between. (Instead of continous, as both the cron
+ expression and ``timedelta`` schedules represent.)
+* Run tasks on different times each day. For example, an astronomer may find it
+ useful to run a task on each sunset, to process data collected from the
+ previous sunlight period.
+* Schedules not following the Gregorian calendar. For example, create a run for
+ each month in the `Traditional Chinese Calendar`_. This is conceptually
+ similar to the sunset case above, but for a different time scale.
+* Rolling windows, or overlapping data intervals. For example, one may want to
+ have a run each day, but make each run cover the period of the previous seven
+ days. It is possible to "hack" this with a cron expression, but a custom data
+ interval would task specification more natural.
+
+.. _`Traditional Chinese Calendar`:
https://en.wikipedia.org/wiki/Chinese_calendar
+
+
+For our example, let's say a company may want to run a job after each weekday,
+to process data collected during the work day. The first intuitively answer
+to this would be ``schedule_interval="0 0 * * 1-5"`` (midnight on Monday to
+Friday), but this means data collected on Friday will *not* be processed right
+after Friday, but on the next Monday, and that run's interval would be from
+midnight Friday to midnight *Monday*.
+
+This is, therefore, a case of the "holes" category; the intended schedule
should
+leave the two weekend days. What we want is:
+
+* Schedule a run for each Monday, Tuesday, Wednesday, Thursday, and Friday. The
+ run's data interval would cover from the midnight of each day, to the
midnight
+ of the next day.
Review comment:
```suggestion
of the next day (e.g. 2021-01-01 00:00:00 to 2021-01-02 00:00:00).
```
Maybe add an example?
##########
File path: docs/apache-airflow/howto/timetable.rst
##########
@@ -0,0 +1,63 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ .. http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+
+Customizing DAG Scheduling with Timetables
+==========================================
+
+A DAG's scheduling strategy is determined by its internal "timetable". This
+timetable can be created by specifying the DAG's ``schedule_interval``
argument,
+as described in :doc:`DAG Run </dag-run>`. The timetable also dictates the data
+interval and the logical time of each run created for the DAG.
+
+However, there are situations when a cron expression or simple ``timedelta``
+periods cannot properly express the schedule. Some of the examples are:
+
+* Data intervals with "holes" between. (Instead of continous, as both the cron
+ expression and ``timedelta`` schedules represent.)
+* Run tasks on different times each day. For example, an astronomer may find it
+ useful to run a task on each sunset, to process data collected from the
Review comment:
```suggestion
* Run tasks at different times each day. For example, an astronomer may find
it
useful to run a task at sunset to process data collected from the
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]