Hi Airflow community!

TL;DR: Is there a/what is the recommended way to define a failure handler that *only* triggers on FAILED, not on UPSTREAM_FAILED?

In more detail: we have a DAG that is essentially this:

set_up -> do_stuff_1 -> do_stuff_2 -> do_stuff_3 -> cleanup

If set_up fails, nothing needs to be done; if do_stuff_1, _2 or _3 fail, we would like to trigger the cleanup task.

Our first approach was to create essentially a copy of the cleanup operator, but with trigger rule ONE_FAILED, and to attach that to do_stuff_1, _2 and _3. However, that didn't quite work as we expected (on 1.7.1.3) because if set_up failed, do_stuff_1 and the others would be set to UPSTREAM_FAILED, and this would trigger the failure handler.

We didn't try the approach in [1], of using the on_failure_handler callback to call the operator directly, because we didn't like the idea of circumventing the "normal" task execution path. We wanted to avoid having some tasks run normally, appear in the task instance overview etc., and some that are essentially invoked "out of band". Also, we weren't clear on the semantics of templated parameters in such a situation.

What we have right now is a slightly modified flow:

set_up -> short_circuit -> do_stuff_1 -> do_stuff_2 -> do_stuff_3 -> cleanup

The short_circuit operator is configured to trigger on ALL_DONE, and succeeds or fails based on the task instance status of the set_up task. That kind-of does the trick in that the failure handler (which is still attached to do_stuff_1, _2 and _3) does not trigger if the short_circuit fails.

However, the result is that the DAG run is marked as successful whereas we'd really like it to be considered failed. Also, checking the status of the set_up task in the short_circuit operator seems to require access (on 1.7.1.3) to quite a lot of "Airflow internals" - something along the lines of:

lambda **kwargs: kwargs['dag'].get_task(...).get_task_instances(session=settings.Session(), start_date=kwargs['execution_date'], end_date=kwargs['execution_date'])[0]...

So we were wondering if we've perhaps missed an easier/more recommended way to do this?

Thanks for a builing this great app and making it available, of course!

Regards

ap

[1] https://groups.google.com/forum/#!topic/airbnb_airflow/6cLDFHUUzhE
--
Andrew Phillips
Apache jclouds

Reply via email to