Scott Wegner created BEAM-3372:
----------------------------------

             Summary: Duplicated 'zone' PipelineOption has inconsistent 
documentation
                 Key: BEAM-3372
                 URL: https://issues.apache.org/jira/browse/BEAM-3372
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
            Reporter: Scott Wegner
            Assignee: Thomas Groh
            Priority: Minor


Two different PipelineOptions interfaces defined a 'zone' option: GcpOptions 
[1] and DataflowWorkerPoolOptions [2]. It's not an error for an option to be 
redefined, and internally Beam checks that the definitions are compatible.

In this case the two 'zone' definitions are compatible but they have different 
descriptions. This can be confusing as setting one will also impact the other.

We should make improvements around duplicate PipelineOptions definitions for a 
given runner. In this case, I propose we:

a) Update the @Description's so that they match.
b) Mark one of them as @Deprecated with a link to the other. Migrate code 
references and plan to remove it on the next major version.
c) Add a test which checks all PipelineOptions on the DataflowRunner classpath 
and verify that any duplicates have the properties above (equivalent 
definitions including @Description, and only one non-@Deprecated version)


[1] 
https://github.com/apache/beam/blob/670941961845593d9a7e09b17c1bd117f27bf579/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L95
[2] 
https://github.com/apache/beam/blob/670941961845593d9a7e09b17c1bd117f27bf579/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java#L175



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to