On Mon, May 6, 2019 at 3:01 PM Ahmet Altay <[email protected]> wrote: > There is RunnerOptions already. Its options are populated by querying the > job service. Any portable runner is able to provide a list of options that > is runner specific through that mechanism. > > *From: *Reza Rokni <[email protected]> > *Date: *Mon, May 6, 2019 at 2:57 PM > *To: * <[email protected]> > > So the options here would be moved to runner options? >> >> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions >> > In theory at least, many options specified in WorkerOptions can apply for all runners hence probably are not truly runner-specific (num_workers, zone, worker_machine_type, etc). Also, moving existing options might be hard due to backwards compatibility reasons.
Some of the truly runner specific options are in XYZRunnerOptions classes. But due to not having a namespace, names there have to be globally unique which can be addressed by introducing class name as a namespace. > >> >> In Java they are in DataflowPipelineWorkerPoolOptions and of course we >> have FlinkPipelineOptions etc... >> >> *From: *Chamikara Jayalath <[email protected]> >> *Date: *Tue, 7 May 2019 at 05:29 >> *To: *dev >> >> >>> On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik <[email protected]> wrote: >>> >>>> There were also discussions[1] in the past about scoping >>>> PipelineOptions to specific PTransforms. Would scoping PipelineOptions to >>>> PTransforms make this a more general solution? >>>> >>>> 1: >>>> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E >>>> >>> >>> Is this just for pipeline construction time or also for runtime ? Trying >>> to scope options for transforms at runtime might complicate things in the >>> presence of optimizations such as fusion. >>> >>> >>>> >>>> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka <[email protected]> wrote: >>>> >>>>> Having namespaces for option makes sense. >>>>> I think, along with a help command to print all the options given the >>>>> runner name will be useful. >>>>> As for the scope of name spacing, I think that assigning a logical >>>>> name space gives more flexibility around how and where we declare options. >>>>> It also make future refactoring possible. >>>>> >>>>> >>>>> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels <[email protected]> >>>>> wrote: >>>>> >>>>>> Good points. As already mentioned there is no namespacing between the >>>>>> different pipeline option classes. In particular, there is no >>>>>> separate >>>>>> namespace for system and user options which is most concerning. >>>>>> >>>>>> I'm in favor of an optional namespace using the class name of the >>>>>> defining pipeline option class. That way we would at least be able to >>>>>> resolve duplicate option names. For example, if there were was >>>>>> "optionX" >>>>>> in class A and B, we could use "A#optionX" to refer to it from class >>>>>> A. >>>>>> >>>>> >>> I think this solves the original problem. Runner specific options will >>> have unique names that includes the runner (in options class). I guess to >>> be complete we also have to include the package (module for Python) ? >>> If an option is globally unique, users should be able to specify it >>> without qualifying (at least for backwards compatibility). >>> >>> >>>> >>>>>> -Max >>>>>> >>>>>> On 04.05.19 02:23, Reza Rokni wrote: >>>>>> > Great point Lukasz, worker machine could be relevant to multiple >>>>>> runners. >>>>>> > >>>>>> > Perhaps for parameters that could have multiple runner relevance, >>>>>> the >>>>>> > doc could be rephrased to reflect its potential multiple uses. For >>>>>> > example change the help information to start with a generic >>>>>> reference " >>>>>> > worker type on the runner" followed by runner specific behavior >>>>>> expected >>>>>> > for RunnerA, RunnerB etc... >>>>>> > >>>>>> > But I do worry that without prefix even generic options could cause >>>>>> > confusion. For example if the use of --network is substantially >>>>>> > different between runnerA vs runnerB then the user will only have >>>>>> this >>>>>> > information by reading the help. It will also mean that a pipeline >>>>>> which >>>>>> > is expected to work both on-premise on RunnerA and in the cloud on >>>>>> > RunnerB could fail because the format of the options to pass to >>>>>> > --network are different. >>>>>> > >>>>>> > Cheers >>>>>> > >>>>>> > Reza >>>>>> > >>>>>> > *From: *Kenneth Knowles <[email protected] <mailto:[email protected]>> >>>>>> > *Date: *Sat, 4 May 2019 at 03:54 >>>>>> > *To: *dev >>>>>> > >>>>>> > Even though they are in classes named for specific runners, >>>>>> they are >>>>>> > not namespaced. All PipelineOptions exist in a global namespace >>>>>> so >>>>>> > they need to be careful to be very precise. >>>>>> > >>>>>> > It is a good point that even though they may be multiple uses >>>>>> for >>>>>> > "machine type" they are probably not going to both happen at the >>>>>> > same time. >>>>>> > >>>>>> > If it becomes an issue, another thing we could do would be to >>>>>> add >>>>>> > namespacing support so options have less spooky action, or at >>>>>> least >>>>>> > have a way to resolve it when it happens on accident. >>>>>> > >>>>>> > Kenn >>>>>> > >>>>>> > On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath >>>>>> > <[email protected] <mailto:[email protected]>> wrote: >>>>>> > >>>>>> > Also, we do have runner specific options classes where truly >>>>>> > runner specific options can go. >>>>>> > >>>>>> > >>>>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java >>>>>> > >>>>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java >>>>>> > >>>>>> > On Fri, May 3, 2019 at 9:50 AM Ahmet Altay < >>>>>> [email protected] >>>>>> > <mailto:[email protected]>> wrote: >>>>>> > >>>>>> > I agree, that is a good point. >>>>>> > >>>>>> > *From: *Lukasz Cwik <[email protected] <mailto: >>>>>> [email protected]>> >>>>>> > *Date: *Fri, May 3, 2019 at 9:37 AM >>>>>> > *To: *dev >>>>>> > >>>>>> > The concept of a machine type isn't necessarily >>>>>> limited >>>>>> > to Dataflow. If it made sense for a runner, they >>>>>> could >>>>>> > use AWS/Azure machine types as well. >>>>>> > >>>>>> > On Fri, May 3, 2019 at 9:32 AM Ahmet Altay >>>>>> > <[email protected] <mailto:[email protected]>> wrote: >>>>>> > >>>>>> > This idea was discussed in a PR a few months >>>>>> ago, >>>>>> > and JIRA was filed as a follow up [1]. IMO, it >>>>>> makes >>>>>> > sense to use a namespace prefix. The primary >>>>>> issue >>>>>> > here is that, such a change will very likely be >>>>>> a >>>>>> > backward incompatible change and would be hard >>>>>> to do >>>>>> > before the next major version. >>>>>> > >>>>>> > [1] >>>>>> https://issues.apache.org/jira/browse/BEAM-6531 >>>>>> > >>>>>> > *From: *Reza Rokni <[email protected] >>>>>> > <mailto:[email protected]>> >>>>>> > *Date: *Thu, May 2, 2019 at 8:00 PM >>>>>> > *To: * <[email protected] >>>>>> > <mailto:[email protected]>> >>>>>> > >>>>>> > Hi, >>>>>> > >>>>>> > Was reading this SO question: >>>>>> > >>>>>> > >>>>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has >>>>>> > >>>>>> > And noticed that in >>>>>> > >>>>>> > >>>>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions >>>>>> > >>>>>> > The option is called --worker_machine_type. >>>>>> > >>>>>> > I wonder if runner specific options should >>>>>> have >>>>>> > the runner in the prefix? Something like >>>>>> > --dataflow_worker_machine_type? >>>>>> > >>>>>> > Cheers >>>>>> > Reza >>>>>> > >>>>>> > -- >>>>>> > >>>>>> > This email may be confidential and >>>>>> privileged. >>>>>> > If you received this communication by >>>>>> mistake, >>>>>> > please don't forward it to anyone else, >>>>>> please >>>>>> > erase all copies and attachments, and >>>>>> please let >>>>>> > me know that it has gone to the wrong >>>>>> person. >>>>>> > >>>>>> > The above terms reflect a potential business >>>>>> > arrangement, are provided solely as a basis >>>>>> for >>>>>> > further discussion, and are not intended to >>>>>> be >>>>>> > and do not constitute a legally binding >>>>>> > obligation. No legally binding obligations >>>>>> will >>>>>> > be created, implied, or inferred until an >>>>>> > agreement in final form is executed in >>>>>> writing >>>>>> > by all parties involved. >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > >>>>>> > This email may be confidential and privileged. If you received this >>>>>> > communication by mistake, please don't forward it to anyone else, >>>>>> please >>>>>> > erase all copies and attachments, and please let me know that it >>>>>> has >>>>>> > gone to the wrong person. >>>>>> > >>>>>> > The above terms reflect a potential business arrangement, are >>>>>> provided >>>>>> > solely as a basis for further discussion, and are not intended to >>>>>> be and >>>>>> > do not constitute a legally binding obligation. No legally binding >>>>>> > obligations will be created, implied, or inferred until an >>>>>> agreement in >>>>>> > final form is executed in writing by all parties involved. >>>>>> > >>>>>> >>>>> >> >> -- >> >> This email may be confidential and privileged. If you received this >> communication by mistake, please don't forward it to anyone else, please >> erase all copies and attachments, and please let me know that it has gone >> to the wrong person. >> >> The above terms reflect a potential business arrangement, are provided >> solely as a basis for further discussion, and are not intended to be and do >> not constitute a legally binding obligation. No legally binding obligations >> will be created, implied, or inferred until an agreement in final form is >> executed in writing by all parties involved. >> >
