Hi,
Flink itself allows the user to specify a String when creating a Job, this
will be visible in the web dashboard and maybe some other places. This
would roughly correspond to the proposed PipelineOptions.pipelineName. An
executing job does not have a human-readable name, just an ID that has to
be used when referring to the job and communicating with the master node to
manage the job.

I think the proposed changes are very good. However, it might not be
immediately possible to refer to a running pipeline by its jobName, due to
implementation specifics in the runners.

Cheers,
Aljoscha

On Tue, 9 Aug 2016 at 21:57 Amit Sela <[email protected]> wrote:

> Currently, the Spark runner extends ApplicationNameOptions, PipelineOptions
> and StreamingOptions. Any unification of naming conventions is great IMO,
> and the runner will inherit them as it is.
> As for appName/pipelineName - appName is the same as Spark's app name, but
> I can live happily with pipelineName ;-)
> Considering jobName - that's usually for the resource manager (I use YARN),
> and the proposal sounds great here as well, though I'd have see how I use
> it programmatically because usually I use the submit script.
>
> +1 and thanks Pei!
>
> Sorry for my late response,
> Amit
>
> On Fri, Aug 5, 2016 at 10:55 PM Pei He <[email protected]> wrote:
>
> > Hi all,
> > I have a proposal about how we name pipelines and their executions.
> > The purpose is to clarify the differences between the two, have
> > consensus between runners, and unify the implementation.
> >
> > Current states:
> >  * PipelineOptions.appName defaults to mainClass name
> >  * DataflowPipelineOptions.jobName defaults to appName+user+datetime
> >  * FlinkPipelineOptions.jobName defaults to appName+user+datetime
> >
> > Proposal:
> > 1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
> >     *  It is the user-visible name for a specific graph.
> >     *  default to mainClass name.
> >     *  Use cases: Find all executions of a pipeline
> > 2. Add jobName to top level PipelineOptions.
> >     *  It is the unique name for an execution
> >     *  defaults to pipelineName + user + datetime + random Integer
> >     *  Use cases:
> >         -- Finding all executions by USER_A between TIME_X and TIME_Y
> >         -- Naming resources created by the execution. for example:
> > Writing temp files to folder TMP_DIR/jobName/, Writing to default
> > output file jobName.output, Creating temp /subscriptions/jobName
> >
> > Please let me know what you think.
> >
> > Thanks
> > --
> > Pei
> >
>

Reply via email to