Re: [PROPOSAL] New operator for "watcher" scenario

Ferruzzi, Dennis Thu, 10 Feb 2022 10:45:04 -0800

I like the concept, but don't have strong opinions on implementation.  One 
thought would be that a similar operator may be handy if we're making that 
change.  "run task ONLY IF an upstream task failed"  might compliment your 
proposed "run task EVEN IF an upstream task failed".  For example, notify me 
ONLY IF an upstream task needs attention, or a task that might have some 
automated remediation steps to take if the previous step failed.  Not sure if 
that would be easy to implement "while you're in there" or if that would be a 
whole other can of worms.

________________________________
From: Jarek Potiuk <[email protected]>
Sent: Thursday, February 10, 2022 6:35 AM
To: [email protected]
Subject: [EXTERNAL] [PROPOSAL] New operator for "watcher" scenario

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.

Hey everyone,

I have a small proposal about adding a new overloaded operator - initial 
proposal '>>=` (and a method), resulting from our discussion on AIP-47 - system 
tests refactoring 
(https://lists.apache.org/thread/htd4013yn483qfhwv11vc26jpf2yvjph).

The ">>=" operator is not really 100% necessary to complete the AIP-47 (we have 
ways around it with somewhat complex-ish pre-commits) but it might simplify the 
way how we approach the example dags turned into system tests and make them 
more maintainable - but also it might simplify "real" DAGS for some users.

Context:

In the AIP-47 we have a need for "status" like functionality in system tests. 
The basic idea is to encompass all "test code" in a single file - example dag. 
This way each system test (which are currently spread among DAG files, pytest 
tests and configuration will "shrink" to the single "example_dag" file. This is 
a great simplification and it will be extremely helpful in system tests but it 
means that we need a "status" like functionality in the DAGs that will fail in 
case any of the tasks failed during the execution.

In a number of DAGs we have to do "cleanup/tearDown" as the last step no matter 
if any of the tasks failed (and this can be easily done with "leaf" all_done 
rule) but then such a cleanup operation determines the status of the DAG 
("failed" succeeded". So we need to by-pass the "cleanup" task and in case of 
any task failure execute the "watcher" task (as we named it) as a leaf-node to 
determine the status of the whole DAG.

Proposed Solution:

Maybe there is an easier way than what we came up with (happy to hear it), but 
the idea we have is to have a "watcher" leaf node task that has "one_failed" 
dependency on all the tasks in the example DAG. This leads to all a bit more 
complex example dags to end up with:

[task1, task2, task3, ... ] >> watcher

This is a bit brittle because it is long and you have to remember to add new 
tasks added in the example dags in the future (and it's very easy to miss it as 
there is no "problem" if you miss a task). We can automate it in pre-commits 
for example dags, but we could easily solve it by adding a new operator, that 
would add "one_failed" dependency between all other tasks to the watcher task:

Pseudocode:

I proposed >>= operator but there also should be a "set_" method):

dag >>= watcher

the operator:

for task in dag.task:
    if not task == watcher:
        task >> watcher

The operator might also be needed in other scenarios - when someone wants to 
send notifications when any of the tasks failed 
(https://stackoverflow.com/questions/50959743/airflow-trigger-rule-using-one-failed-cause-dag-failure)
or (I imagine) when you want to make sure to teardown your whole infrastructure 
you set-up in a complex DAG with cases similar to the case Cloudflare presented 
last year in the summit 
https://airflowsummit.org/sessions/2021/provision-as-a-service/ . Some of that 
can be done with cluster policies etc. But I thought this might be a much nicer 
way.

WDYT? Are there any better/simpler ways of solving this problem that we are not 
aware of ? Do you think it is "enough" of justification to add a new 
method/overloaded operator like that? Are there any strong "no", or maybe we 
could agree to it via lazy consensus if there are some supportive voices and no 
viable alternative?

BTW. We are also open to change (or drop) the ">>=" proposed operator if 
someone thinks this might be confusing.

J.

Re: [PROPOSAL] New operator for "watcher" scenario

Reply via email to