baolsen commented on a change in pull request #6999: [AIRFLOW-XXXX] Clarify
wait_for_downstream and execution_date
URL: https://github.com/apache/airflow/pull/6999#discussion_r362747213
##########
File path: docs/concepts.rst
##########
@@ -113,13 +116,138 @@ DAGs can be used as context managers to automatically
assign new operators to th
op.dag is dag # True
-.. _concepts-operators:
+.. _concepts:dagruns:
+
+DAG Runs
+========
+
+A DAG run is a physical instance of a DAG, containing task instances that run
for a specific ``execution_date``.
+
+A DAG run is usually created by the Airflow scheduler, but can also be created
by an external trigger.
+Multiple DAG runs may be running at once for a particular DAG, each of them
having a different ``execution_date``.
+For example, we might currently have two DAG runs that are in progress for
2016-01-01 and 2016-01-02 respectively.
+
+.. _concepts:execution_date:
+
+execution_date
+--------------
+
+The ``execution_date`` is the *logical* date and time which the DAG Run, and
its task instances, are running for.
+
+This allows task instances to process data for the desired *logical* date &
time.
+While a task_instance or DAG run might have a *physical* start date of now,
+their *logical* date might be 3 months ago because we are busy reloading
something.
+
+In the prior example the ``execution_date`` was 2016-01-01 for the first DAG
Run and 2016-01-02 for the second.
+
+A DAG run and all task instances created within it are instanced with the same
``execution_date``, so
+that logically you can think of a DAG run as simulating the DAG running all of
its tasks at some
+previous date & time specified by the ``execution_date``.
+
+.. _concepts:tasks:
+
+Tasks
+=====
+
+A Task defines a unit of work within a DAG; it is represented as a node in the
DAG graph, and it is written in Python.
+
+Each task is an implementation of an Operator, for example a
``PythonOperator`` to execute some Python code,
+or a ``BashOperator`` to run a Bash command.
+
+The task implements an operator by defining specific values for that operator,
+such as a Python callable in the case of ``PythonOperator`` or a Bash command
in the case of ``BashOperator``.
+
+Relations between Tasks
+-----------------------
+
+Consider the following DAG with two tasks.
+Each task is a node in our DAG, and there is a dependency from task_1 to
task_2:
+
+.. code:: python
+
+ with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
+ task_1 = DummyOperator('task_1')
+ task_2 = DummyOperator('task_2')
+ task_1 >> task_2 # Define dependencies
+
+We can say that task_1 is *upstream* of task_2, and conversely task_2 is
*downstream* of task_1.
+When a DAG Run is created, task_1 will start running and task_2 waits for
task_1 to complete successfully before it may start.
+
+Task Instances
+==============
+
+A task instance represents a specific run of a task and is characterized as the
+combination of a DAG, a task, and a point in time (``execution_date``). Task
instances
+also have an indicative state, which could be "running", "success", "failed",
"skipped", "up
+for retry", etc.
+
+Tasks are defined in DAGs, and both are written in Python code to define what
you want to do.
+Task Instances belong to DAG Runs, have an associated ``execution_date``, and
are physicalised, runnable entities.
+
+Relations between Task Instances
+--------------------------------
+
+Again consider the following tasks, defined for some DAG:
+
+.. code:: python
+
+ with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
+ task_1 = DummyOperator('task_1')
+ task_2 = DummyOperator('task_2')
+ task_1 >> task_2 # Define dependencies
+
+When we enable this DAG, the scheduler creates several DAG Runs - one with
``execution_date`` of 2016-01-01,
+one with ``execution_date`` of 2016-01-02, and so on up to the current date.
+
+Each DAG Run will contain a task_1 Task Instance and a task_2 Task instance.
Both Task Instances will
+have ``execution_date`` equal to the DAG Run's ``execution_date``, and each
task_2 will be *upstream* of
+(depends on) its task_1.
+
+We can also say that task_1 for 2016-01-01 is the *previous* task instance of
the task_1 for 2016-01-02.
+Or that the DAG Run for 2016-01-01 is the *previous* DAG Run to the DAG Run of
2016-01-02.
+Here, *previous* refers to the logical past/prior ``execution_date``, that
runs independently of other runs,
+and *upstream* refers to a dependency within the same run and having the same
``execution_date``.
+
Review comment:
Not sure how you feel about this note - but I think differentiating between
previous and upstream is important for a new user especially. The concepts of
upstream / downstream task didn't sink in for me at first and explicitly
calling them out from previous would have helped me immediately understand.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services