Yes, I often refer to this as the "atomicity problem": we usually want
all four "create", "add steps", "wait for steps", "terminate cluster"
tasks to retry together and in order if any one of them fails (well, the
terminate one is questionable). In our current Dags we resolved this by
putting the tasks in a SubDag and changing the on_retry_callback to
clear the state of the sub-tasks. But use of SubDags makes navigation in
the UI a bit of a pain, so we've planned to merge them into a single
custom operator soon. There's also the problem that for big workflows,
doing this adds a lot of duration due to the management overhead of
starting/stopping EMR clusters instead of reusing them. I'm about to
send a separate e-mail about that.
I think it'd be great to have a combined operator upstream in the codebase!
Jon
On 14/10/2019 20:42, Daniel Mateus Pires wrote:
Hi there!
Would it make sense to add an operator that is both the EmrAddStep operator
and the step sensor?
In a past role we were using Airflow heavily for all things EMR, and I
found myself writing an Operator that combined the emr_add_step operator
and the sensor, it made the pipelines simpler (less operator instances per
DAG) and retries were easy
There is still value in keeping those 2 other classes around when we don't
care about the result of an EMR step or we are polling for the completion
of an EMR step we did not start from Airflow, but for most tasks wouldn't a
"merged operator" make sense?
Thanks!
Daniel