Hi Airflow community!
TL;DR: Is there a/what is the recommended way to define a failure
handler that *only* triggers on FAILED, not on UPSTREAM_FAILED?
In more detail: we have a DAG that is essentially this:
set_up -> do_stuff_1 -> do_stuff_2 -> do_stuff_3 -> cleanup
If set_up fails, nothing needs to be done; if do_stuff_1, _2 or _3 fail,
we would like to trigger the cleanup task.
Our first approach was to create essentially a copy of the cleanup
operator, but with trigger rule ONE_FAILED, and to attach that to
do_stuff_1, _2 and _3. However, that didn't quite work as we expected
(on 1.7.1.3) because if set_up failed, do_stuff_1 and the others would
be set to UPSTREAM_FAILED, and this would trigger the failure handler.
We didn't try the approach in [1], of using the on_failure_handler
callback to call the operator directly, because we didn't like the idea
of circumventing the "normal" task execution path. We wanted to avoid
having some tasks run normally, appear in the task instance overview
etc., and some that are essentially invoked "out of band". Also, we
weren't clear on the semantics of templated parameters in such a
situation.
What we have right now is a slightly modified flow:
set_up -> short_circuit -> do_stuff_1 -> do_stuff_2 -> do_stuff_3 ->
cleanup
The short_circuit operator is configured to trigger on ALL_DONE, and
succeeds or fails based on the task instance status of the set_up task.
That kind-of does the trick in that the failure handler (which is still
attached to do_stuff_1, _2 and _3) does not trigger if the short_circuit
fails.
However, the result is that the DAG run is marked as successful whereas
we'd really like it to be considered failed. Also, checking the status
of the set_up task in the short_circuit operator seems to require access
(on 1.7.1.3) to quite a lot of "Airflow internals" - something along the
lines of:
lambda **kwargs:
kwargs['dag'].get_task(...).get_task_instances(session=settings.Session(),
start_date=kwargs['execution_date'],
end_date=kwargs['execution_date'])[0]...
So we were wondering if we've perhaps missed an easier/more recommended
way to do this?
Thanks for a builing this great app and making it available, of course!
Regards
ap
[1] https://groups.google.com/forum/#!topic/airbnb_airflow/6cLDFHUUzhE
--
Andrew Phillips
Apache jclouds