What I mean is that a user may find that it works for them to pass "--myarg blah" and access it as "options.myarg" without explicitly defining a "my_arg" flag due to the added logic. This is not the intended behavior and we may want to change this implementation detail in the future. However, having this logic in a released version makes it hard to change this behavior since users may erroneously depend on this undocumented behavior. Instead, we should namespace / scope this so that it is obvious that this is meant for runner (and not Beam user) consumption.
On Fri, Oct 12, 2018 at 10:48 AM Thomas Weise <t...@apache.org> wrote: > Can you please elaborate more what practical problems this introduces for > users? > > I can see that this change allows a user to specify a runner specific > option, which in the future may change because we decide to scope > differently. If this only affects users of the portable Flink runner (like > us), then no need to revert, because at this early stage we prefer > something that works over being blocked. > > It would also be really great if some of the core Python SDK developers > could help out with the design aspects and PR reviews of changes that > affect common Python code. Anyone who specifically wants to be tagged on > relevant JIRAs and PRs? > > Thanks > > > On Fri, Oct 12, 2018 at 10:20 AM Ahmet Altay <al...@google.com> wrote: > >> >> >> On Fri, Oct 12, 2018 at 10:11 AM, Charles Chen <c...@google.com> wrote: >> >>> For context, I made comments on https://github.com/apache/beam/pull/6600 >>> noting that the changes being made were not good for Beam >>> backwards-compatibility. The change as is allows users to use pipeline >>> options without explicitly defining them, which is not the type of usage we >>> would like to encourage since we prefer to be explicit whenever possible. >>> If users write pipelines with this sort of pattern, they will potentially >>> encounter pain when upgrading to a later version since this is an >>> implementation detail and not an officially supported pattern. I agree >>> with the comments above that this is ultimately a scoping issue. I would >>> not have a problem with these changes if they were explicitly scoped under >>> either a runner or unparsed options namespace. >>> >>> As a second note, since the 2.8.0 release is being cut right now, >>> because of these backwards-compatibility concerns, I would suggest >>> reverting these changes, at least until 2.8.0 is cut, so we can have a >>> discussion here before committing to and releasing any API-level changes. >>> >> >> +1 I would like to revert the changes in order not rush this into the >> release. Once this discussion results in an agreement changes can be >> brought back. >> >> >>> >>> On Fri, Oct 12, 2018 at 9:26 AM Henning Rohde <hero...@google.com> >>> wrote: >>> >>>> Agree that pipeline options lack some mechanism for scoping. It is also >>>> not always possible distinguish options meant to be consumed at pipeline >>>> construction time, by the runner, by the SDK harness, by the user code or >>>> any combination -- and this causes confusion every now and then. >>>> >>>> For Dataflow, we have been using "experiments" for arbitrary >>>> runner-specific options. It's simply a string list pipeline option that all >>>> SDKs support and, for Go at least, is sent to portable runners. Flink can >>>> do the same in the short term to move forward. >>>> >>>> Henning >>>> >>>> >>>> On Fri, Oct 12, 2018 at 8:50 AM Thomas Weise <t...@apache.org> wrote: >>>> >>>>> [moving to the list] >>>>> >>>>> The requirement driving this part of the change was to allow a user to >>>>> specify pipeline options that a runner supports without having to declare >>>>> those in each language SDK. >>>>> >>>>> In the specific scenario, we have options that the Flink runner >>>>> supports (and can validate), that are not enumerated in the Python SDK. >>>>> >>>>> I think we have a bigger problem scoping pipeline options. For >>>>> example, the runner options are dumped into the SDK worker. There is also >>>>> a >>>>> possibility of name collisions. So I think this would benefit from broader >>>>> feedback. >>>>> >>>>> Thanks, >>>>> Thomas >>>>> >>>>> >>>>> ---------- Forwarded message --------- >>>>> From: Charles Chen <notificati...@github.com> >>>>> Date: Fri, Oct 12, 2018 at 8:36 AM >>>>> Subject: Re: [apache/beam] [BEAM-5442] Store duplicate unknown options >>>>> in a list argument (#6600) >>>>> To: apache/beam <b...@noreply.github.com> >>>>> Cc: Thomas Weise <thomas.we...@gmail.com>, Mention < >>>>> ment...@noreply.github.com> >>>>> >>>>> >>>>> CC: @tweise <https://github.com/tweise> >>>>> >>>>> — >>>>> You are receiving this because you were mentioned. >>>>> Reply to this email directly, view it on GitHub >>>>> <https://github.com/apache/beam/pull/6600#issuecomment-429367754>, or mute >>>>> the thread >>>>> <https://github.com/notifications/unsubscribe-auth/AAQGDwwt15R85eq9pySUisyxq2HYz-Vyks5ukLcLgaJpZM4XMo-T> >>>>> . >>>>> >>>> >>