Hi all,
I have a proposal about how we name pipelines and their executions.
The purpose is to clarify the differences between the two, have
consensus between runners, and unify the implementation.

Current states:
 * PipelineOptions.appName defaults to mainClass name
 * DataflowPipelineOptions.jobName defaults to appName+user+datetime
 * FlinkPipelineOptions.jobName defaults to appName+user+datetime

Proposal:
1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
    *  It is the user-visible name for a specific graph.
    *  default to mainClass name.
    *  Use cases: Find all executions of a pipeline
2. Add jobName to top level PipelineOptions.
    *  It is the unique name for an execution
    *  defaults to pipelineName + user + datetime + random Integer
    *  Use cases:
        -- Finding all executions by USER_A between TIME_X and TIME_Y
        -- Naming resources created by the execution. for example:
Writing temp files to folder TMP_DIR/jobName/, Writing to default
output file jobName.output, Creating temp /subscriptions/jobName

Please let me know what you think.

Thanks
--
Pei

Reply via email to