ashb commented on a change in pull request #6295: [AIRFLOW-XXX] Adding Task 
re-run documentation
URL: https://github.com/apache/airflow/pull/6295#discussion_r333925028
 
 

 ##########
 File path: docs/dag-run.rst
 ##########
 @@ -0,0 +1,193 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+DAG Runs
+=========
+A DAG Run is an object representing an instantiation of the DAG in time.
+
+Each DAG may or may not have a schedule, which informs how ``DAG Runs`` are
+created. ``schedule_interval`` is defined as a DAG arguments, and receives
+preferably a
+`cron expression <https://en.wikipedia.org/wiki/Cron#CRON_expression>`_ as
+a ``str``, or a ``datetime.timedelta`` object. Alternatively, you can also
+use one of these cron "preset":
+
++--------------+----------------------------------------------------------------+---------------+
+| preset       | meaning                                                       
 | cron          |
++==============+================================================================+===============+
+| ``None``     | Don't schedule, use for exclusively "externally triggered"    
 |               |
+|              | DAGs                                                          
 |               |
++--------------+----------------------------------------------------------------+---------------+
+| ``@once``    | Schedule once and only once                                   
 |               |
++--------------+----------------------------------------------------------------+---------------+
+| ``@hourly``  | Run once an hour at the beginning of the hour                 
 | ``0 * * * *`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@daily``   | Run once a day at midnight                                    
 | ``0 0 * * *`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@weekly``  | Run once a week at midnight on Sunday morning                 
 | ``0 0 * * 0`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@monthly`` | Run once a month at midnight of the first day of the month    
 | ``0 0 1 * *`` |
++--------------+----------------------------------------------------------------+---------------+
+| ``@yearly``  | Run once a year at midnight of January 1                      
 | ``0 0 1 1 *`` |
++--------------+----------------------------------------------------------------+---------------+
+
+Your DAG will be instantiated for each schedule along with a corresponding 
+``DAG Run`` entry in backend.
+
+**Note**: If you run a DAG on a schedule_interval of one day, the run stamped 
2020-01-01 
+will be triggered soon after 2020-01-01T23:59. In other words, the job 
instance is 
+started once the period it covers has ended.  The execution_date passed in the 
dag 
+will also be 2020-01-01.
+
+The first ``DAG Run`` is created based on the minimum ``start_date`` for the 
tasks in your DAG. 
+Subsequent ``DAG Runs`` are created by the scheduler process, based on your 
DAG’s ``schedule_interval``, 
+sequentially. If your start_date is 2020-01-01 and schedule_interval is @daily 
the first run 
+will be created on 2020-01-02 i.e. after your start date has passed.
+
+Re-run DAG
+''''''''''
+There can be cases where you will want to execute your DAG again. One such 
case is when the scheduled
+DAG run fails. Another can be the scheduled DAG run wasn't executed due to low 
resources or the DAG being turned off.
+
+Catchup
+-------
+
+An Airflow DAG with a ``start_date``, possibly an ``end_date``, and a 
``schedule_interval`` defines a 
+series of intervals which the scheduler turn into individual DAG Runs and 
execute. A key capability 
+of Airflow is that these DAG Runs are atomic and idempotent items. The 
scheduler, by default, will
+kick off a DAG Run for any interval that has not been run (or has been 
cleared). This concept is called Catchup.
+
+If your DAG is written to handle its own catchup (i.e. not limited to the 
interval, but instead to ``Now`` for instance.), 
+then you will want to turn catchup off. This can be done by setting ``catchup 
= False`` in DAG  or ``catchup_by_default = False``
+in configuration file. When turned off, the scheduler creates a DAG run only 
for the latest interval.
+
+.. code:: python
+
+    """
+    Code that goes along with the Airflow tutorial located at:
+    
https://github.com/apache/airflow/blob/master/airflow/example_dags/tutorial.py
+    """
+    from airflow import DAG
+    from airflow.operators.bash_operator import BashOperator
+    from datetime import datetime, timedelta
+
+
+    default_args = {
+        'owner': 'Airflow',
+        'depends_on_past': False,
+        'start_date': datetime(2015, 12, 1),
+        'email': ['airf...@example.com'],
+        'email_on_failure': False,
+        'email_on_retry': False,
+        'retries': 1,
+        'retry_delay': timedelta(minutes=5)
+    }
+
+    dag = DAG(
+        'tutorial',
+        default_args=default_args,
+        description='A simple tutorial DAG',
+        schedule_interval='@daily',
+        catchup=False)
+
+In the example above, if the DAG is picked up by the scheduler daemon on 
2016-01-02 at 6 AM, 
+(or from the command line), a single DAG Run will be created, with an 
`execution_date` of 2016-01-01, 
+and the next one will be created just after midnight on the morning of 
2016-01-03 with an execution date of 2016-01-02.
+
+If the ``dag.catchup`` value had been True instead, the scheduler would have 
created a DAG Run 
+for each completed interval between 2015-12-01 and 2016-01-02 (but not yet one 
for 2016-01-02, 
+as that interval hasn’t completed) and the scheduler will execute them 
sequentially. 
+
+Catchup is also triggered when you turn off a DAG for a specified period of 
time and then re-enable.
+
+This behavior 
+is great for atomic datasets that can easily be split into periods. Turning 
catchup off is great 
+if your DAG Runs perform backfill internally.
+
+
+Backfill
+---------
+There can be the case when you may want to run the dag for a specified 
historical period e.g. a data pipeline
+which dumps data in a DFS every day and another pipeline which requires last 1 
month of data in DFS. 
+This is known as Backfill.
+
+You may want to backfill the data even in the cases when catchup is disabled. 
This can be done through CLI. 
+Run the below command
+
+.. code:: bash
+
+    airflow backfill -s START_DATE -e END_DATE dag_id
+
+The above command will re-run all the instances of the dag_id for all the 
intervals within the start date and end date.
+
+Re-run Tasks
+------------
+It can happen, that some of the tasks can fail during the scheduled run. Once 
you have fixed 
+the errors after going through the logs, you can re-run the tasks by clearing 
it for the 
+scheduled date. Clearing a task instance doesn't delete the task instance 
record. 
+Instead it updates ``max_tries`` to ``0`` and set the current task instance 
state to be ``None``, this forces the task to re-run.
+
+Select the failed task and click on **Clear**. This will clear the status of 
the task from 
+failed to ``None`` and the executor will re-run it.
+
+There are multiple options you can select to re-run - 
+
+* Past - All the instances of the task in the  runs before the current DAG's 
execution date
+* Future -  All the instances of the task in the  runs after the current DAG's 
execution date
+* Upstream - The upstream tasks in the current DAG
+* Downstream - The downstream tasks in the current DAG
+* Recursive - All the tasks in the child DAGs and parent DAGs
+* Failed - Only the failed tasks in the current DAG
+
+You can also clear the task through CLI using the command:
+
+.. code:: bash
+
+    airflow tasks clear dag_id -t task_regex -s START_DATE -d END_DATE
+
+This will clear all instances of the tasks matching the regex for the dag_id 
which have run during 
+the mentioned interval. For more options, you can run the command:
+
+.. code:: bash
+
+    airflow tasks clear -h
+
+**Note**: When clearing a set of tasks’ state in hope of getting them to 
re-run, it is important 
+to keep in mind the DAG Run’s state too as it defines whether the scheduler 
should look
+into triggering tasks for that run.
+
+
+External Triggers
+'''''''''''''''''
+
+Note that ``DAG Runs`` can also be created manually through the CLI. Just run 
the command -
+
+.. code:: bash
+
+    airflow dags trigger -e execution_date run_id
+
+The ``DAG Runs`` created externally to the scheduler get associated to the 
trigger’s timestamp, and will be displayed 
 
 Review comment:
   (This applies to the whole document, not just here)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to