Currently, the Spark runner extends ApplicationNameOptions, PipelineOptions and StreamingOptions. Any unification of naming conventions is great IMO, and the runner will inherit them as it is. As for appName/pipelineName - appName is the same as Spark's app name, but I can live happily with pipelineName ;-) Considering jobName - that's usually for the resource manager (I use YARN), and the proposal sounds great here as well, though I'd have see how I use it programmatically because usually I use the submit script.
+1 and thanks Pei! Sorry for my late response, Amit On Fri, Aug 5, 2016 at 10:55 PM Pei He <[email protected]> wrote: > Hi all, > I have a proposal about how we name pipelines and their executions. > The purpose is to clarify the differences between the two, have > consensus between runners, and unify the implementation. > > Current states: > * PipelineOptions.appName defaults to mainClass name > * DataflowPipelineOptions.jobName defaults to appName+user+datetime > * FlinkPipelineOptions.jobName defaults to appName+user+datetime > > Proposal: > 1. Replace PipelineOptions.appName with PipelineOptions.pipelineName. > * It is the user-visible name for a specific graph. > * default to mainClass name. > * Use cases: Find all executions of a pipeline > 2. Add jobName to top level PipelineOptions. > * It is the unique name for an execution > * defaults to pipelineName + user + datetime + random Integer > * Use cases: > -- Finding all executions by USER_A between TIME_X and TIME_Y > -- Naming resources created by the execution. for example: > Writing temp files to folder TMP_DIR/jobName/, Writing to default > output file jobName.output, Creating temp /subscriptions/jobName > > Please let me know what you think. > > Thanks > -- > Pei >
