Currently, the Spark runner extends ApplicationNameOptions, PipelineOptions
and StreamingOptions. Any unification of naming conventions is great IMO,
and the runner will inherit them as it is.
As for appName/pipelineName - appName is the same as Spark's app name, but
I can live happily with pipelineName ;-)
Considering jobName - that's usually for the resource manager (I use YARN),
and the proposal sounds great here as well, though I'd have see how I use
it programmatically because usually I use the submit script.

+1 and thanks Pei!

Sorry for my late response,
Amit

On Fri, Aug 5, 2016 at 10:55 PM Pei He <[email protected]> wrote:

> Hi all,
> I have a proposal about how we name pipelines and their executions.
> The purpose is to clarify the differences between the two, have
> consensus between runners, and unify the implementation.
>
> Current states:
>  * PipelineOptions.appName defaults to mainClass name
>  * DataflowPipelineOptions.jobName defaults to appName+user+datetime
>  * FlinkPipelineOptions.jobName defaults to appName+user+datetime
>
> Proposal:
> 1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
>     *  It is the user-visible name for a specific graph.
>     *  default to mainClass name.
>     *  Use cases: Find all executions of a pipeline
> 2. Add jobName to top level PipelineOptions.
>     *  It is the unique name for an execution
>     *  defaults to pipelineName + user + datetime + random Integer
>     *  Use cases:
>         -- Finding all executions by USER_A between TIME_X and TIME_Y
>         -- Naming resources created by the execution. for example:
> Writing temp files to folder TMP_DIR/jobName/, Writing to default
> output file jobName.output, Creating temp /subscriptions/jobName
>
> Please let me know what you think.
>
> Thanks
> --
> Pei
>

Reply via email to