[GitHub] [airflow] mnojek opened a new pull request #21382: Extend documentation for states of DAGs & tasks and update trigger rules docs

GitBox Mon, 07 Feb 2022 02:44:20 -0800


mnojek opened a new pull request #21382:
URL: https://github.com/apache/airflow/pull/21382

**The story behind this change:**
This change is somehow inspired by the
[AIP-47](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests)
that is still in the *discussion* phase. In this new design of system tests,
we need to make use of [Trigger
Rules](https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#trigger-rules)
for handling `teardown` tasks (e.g. for cleaning resources required by
recently executed the system test). The main concern is that when we use
`teardown` task with trigger rule `all_done` (to make sure that it is executed
even if something's gone wrong in the middle of the test), the whole test
(which is just a DAG) will take the result from this particular `teardown` task
and we can lose the information about some failing task in the middle. This is
not expected workflow for the tests, because we want the test to fail if any
step (task) failed. The reason why the whole test gets the status of a
`teardown` task and not signalizes that anyt
hing failed in the middle is that Airflow works like this - the DAG Run status
is determined by the status of the "leaf nodes" (the tasks that do not have any
children). Since the `teardown` task is the leaf node, the whole test gets the
same status (which is almost always `success`).
That's why we need to have another `watcher` task with trigger rule set to
`one_failed` that is a downstream task for any other task in the test (DAG).
Thanks to this, it will be triggered if any of the task in the DAG failed and
thus its status will be propagated to the DAG Run (because it is a leaf node).

By doing the research in the documentation and code, I found it very
difficult to find the information how trigger rules work in the details and
that's why I thought that it would be good to extend the documentation for
them. Since I already spent some time to understand it deeply, I also took the
effort and prepared this PR. It's not big, but it took me several hours to
prepare these statements. I am not sure if all the statements are correct, so
please read it carefully and correct me if I'm wrong and I will edit the PR.

On the other hand, I would like to also start a discussion about the trigger
rules. To me they seemed simple at first, but the more time I spent with them I
figured out that they introduce a lot of complications to the task execution. I
hope that this PR will make it easier to understand how they work. If you have
any ideas how we can make them even better, I am glad to discuss it.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] mnojek opened a new pull request #21382: Extend documentation for states of DAGs & tasks and update trigger rules docs

Reply via email to