Hi, Flink itself allows the user to specify a String when creating a Job, this will be visible in the web dashboard and maybe some other places. This would roughly correspond to the proposed PipelineOptions.pipelineName. An executing job does not have a human-readable name, just an ID that has to be used when referring to the job and communicating with the master node to manage the job.
I think the proposed changes are very good. However, it might not be immediately possible to refer to a running pipeline by its jobName, due to implementation specifics in the runners. Cheers, Aljoscha On Tue, 9 Aug 2016 at 21:57 Amit Sela <[email protected]> wrote: > Currently, the Spark runner extends ApplicationNameOptions, PipelineOptions > and StreamingOptions. Any unification of naming conventions is great IMO, > and the runner will inherit them as it is. > As for appName/pipelineName - appName is the same as Spark's app name, but > I can live happily with pipelineName ;-) > Considering jobName - that's usually for the resource manager (I use YARN), > and the proposal sounds great here as well, though I'd have see how I use > it programmatically because usually I use the submit script. > > +1 and thanks Pei! > > Sorry for my late response, > Amit > > On Fri, Aug 5, 2016 at 10:55 PM Pei He <[email protected]> wrote: > > > Hi all, > > I have a proposal about how we name pipelines and their executions. > > The purpose is to clarify the differences between the two, have > > consensus between runners, and unify the implementation. > > > > Current states: > > * PipelineOptions.appName defaults to mainClass name > > * DataflowPipelineOptions.jobName defaults to appName+user+datetime > > * FlinkPipelineOptions.jobName defaults to appName+user+datetime > > > > Proposal: > > 1. Replace PipelineOptions.appName with PipelineOptions.pipelineName. > > * It is the user-visible name for a specific graph. > > * default to mainClass name. > > * Use cases: Find all executions of a pipeline > > 2. Add jobName to top level PipelineOptions. > > * It is the unique name for an execution > > * defaults to pipelineName + user + datetime + random Integer > > * Use cases: > > -- Finding all executions by USER_A between TIME_X and TIME_Y > > -- Naming resources created by the execution. for example: > > Writing temp files to folder TMP_DIR/jobName/, Writing to default > > output file jobName.output, Creating temp /subscriptions/jobName > > > > Please let me know what you think. > > > > Thanks > > -- > > Pei > > >
