Would this change to EmrAddStepsOperator make sense? If so I can go ahead
and create a ticket and a PR

On Wed, Nov 13, 2019 at 8:11 PM Aviem Zur <[email protected]> wrote:

> Sure.
>
> While most EMR clusters are ephemeral, some of our use cases required
> persistent EMR clusters since the apps they run are short and run on a
> short interval so the overhead of creating a new EMR cluster is too high.
>
> In these cases I want to make sure that if the cluster dies and is
> replaced by another one nothing needs to change in the DAG.
>
> So if I search by cluster name (In our use case we only have 1 cluster
> alive for any given name) I can always find the correct cluster ID.
>
> Perhaps instead of a whole operator it can be added to EmrHook as you
> suggested, then an option to pass either cluster name or id
> to EmrAddStepsOperator (which today only accepts cluster id [param
> job_flow_id]).
>
> On Wed, Nov 13, 2019 at 5:59 PM Ash Berlin-Taylor <[email protected]> wrote:
>
>> My initial thought is that doesn't quite sound like a whole operator, but
>> a useful function to add to the EmrHook.
>>
>> Could you describe in a little bit more detail how you use it?
>>
>> -a
>>
>> > On 13 Nov 2019, at 15:40, Aviem Zur <[email protected]> wrote:
>> >
>> > Hi,
>> >
>> > I've created a new operator and want to check viability to contribute
>> it to
>> > airflow/contrib
>> >
>> > The operator is called: emr_cluster_name_to_id
>> >
>> > Given an EMR cluster name will return id of the first live cluster found
>> > with a matching name.
>> > This is useful for users with persistent EMR clusters they wish to add
>> > steps to via airflow.
>> > If the cluster dies and is replaced by a new cluster with the same name
>> no
>> > code or configuration needs to be changed since the operator will pick
>> up
>> > the correct id when the DAG is run.
>> >
>> > Is this a viable operator for airflow/contrib?
>> > If so I'll create a JIRA task and a PR on GitHub.
>> >
>> > Thanks,
>> > Aviem
>>
>>

Reply via email to