+1 for explicit --runner_option=param=val,... It's hard to tell otherwise where an option is going to,
On Mon, Oct 15, 2018 at 8:04 AM Robert Bradshaw <rober...@google.com> wrote: > On Mon, Oct 15, 2018 at 3:58 PM Maximilian Michels <m...@apache.org> wrote: > > > > I agree that the current approach breaks the pipeline options contract > > because "unknown" options get parsed in the same way as options which > > have been defined by the user. > > FWIW, I think we're already breaking this "contract." Unknown options > are silently ignored; with this change we just change how we record > them. It still feels a bit hacky though. > > > I'm not sure the `experiments` flag works for us. AFAIK it only allows > > true/false flags. We want to pass all types of pipeline options to the > > Runner. > > Experiments is an arbitrary set of strings, which can be of the form > "param=value" if that's useful. (Dataflow does this.) There is, again, > no namespacing on the param names, but we could user urns or impose > some other structure here. > > > How to solve this? > > > > 1) Add all options of all Runners to each SDK > > We added some of the FlinkRunner options to the Python SDK but realized > > syncing is rather cumbersome in the long term. However, we want the most > > important options to be validated on the client side. > > I don't think this is sustainable in the long run. However, thinking > about this, in the worse case validation happens after construction > but before execution (as with much of our other validation) so it > isn't that bad. > > > 2) Pass "unknown" options via a separate list in the Proto which can > > only be accessed internally by the Runners. This still allows passing > > arbitrary options but we wouldn't leak unknown options and display them > > as top-level options. > > I think there needs to be a way for the user to communicate values > directly to the runner regardless of the SDK. My preference would be > to make this explicit, e.g. (repeated) --runner_option=..., rather > than scooping up all unknown flags at command line parsing time. > Perhaps an SDK that is aware of some runners could choose to lift > these as top-level options, but still pass them as runner options. > > > On 13.10.18 02:34, Charles Chen wrote: > > > The current release branch > > > (https://github.com/apache/beam/commits/release-2.8.0) was cut after > the > > > revert went in. Sent out https://github.com/apache/beam/pull/6683 as > a > > > revert of the revert. Regarding your comment above, I can help out > with > > > the design / PR reviews for common Python code as you suggest. > > > > > > On Fri, Oct 12, 2018 at 4:48 PM Thomas Weise <t...@apache.org > > > <mailto:t...@apache.org>> wrote: > > > > > > Thanks, will tag you and looking forward to feedback so we can > > > ensure that changes work for everyone. > > > > > > Looking at the PR, I see agreement from Max to revert the change on > > > the release branch, but not in master. Would you mind to restore it > > > in master? > > > > > > Thanks > > > > > > On Fri, Oct 12, 2018 at 4:40 PM Ahmet Altay <al...@google.com > > > <mailto:al...@google.com>> wrote: > > > > > > > > > > > > On Fri, Oct 12, 2018 at 11:31 AM, Charles Chen <c...@google.com > > > <mailto:c...@google.com>> wrote: > > > > > > What I mean is that a user may find that it works for them > > > to pass "--myarg blah" and access it as "options.myarg" > > > without explicitly defining a "my_arg" flag due to the > added > > > logic. This is not the intended behavior and we may want > to > > > change this implementation detail in the future. However, > > > having this logic in a released version makes it hard to > > > change this behavior since users may erroneously depend on > > > this undocumented behavior. Instead, we should namespace / > > > scope this so that it is obvious that this is meant for > > > runner (and not Beam user) consumption. > > > > > > On Fri, Oct 12, 2018 at 10:48 AM Thomas Weise > > > <t...@apache.org <mailto:t...@apache.org>> wrote: > > > > > > Can you please elaborate more what practical problems > > > this introduces for users? > > > > > > I can see that this change allows a user to specify a > > > runner specific option, which in the future may change > > > because we decide to scope differently. If this only > > > affects users of the portable Flink runner (like us), > > > then no need to revert, because at this early stage we > > > prefer something that works over being blocked. > > > > > > It would also be really great if some of the core > Python > > > SDK developers could help out with the design aspects > > > and PR reviews of changes that affect common Python > > > code. Anyone who specifically wants to be tagged on > > > relevant JIRAs and PRs? > > > > > > > > > I would be happy to be tagged, and I can also help with > > > including other relevant folks whenever possible. In general I > > > think Robert, Charles, myself are good candidates. > > > > > > > > > Thanks > > > > > > > > > On Fri, Oct 12, 2018 at 10:20 AM Ahmet Altay > > > <al...@google.com <mailto:al...@google.com>> wrote: > > > > > > > > > > > > On Fri, Oct 12, 2018 at 10:11 AM, Charles Chen > > > <c...@google.com <mailto:c...@google.com>> wrote: > > > > > > For context, I made comments on > > > https://github.com/apache/beam/pull/6600 > noting > > > that the changes being made were not good for > > > Beam backwards-compatibility. The change as is > > > allows users to use pipeline options without > > > explicitly defining them, which is not the type > > > of usage we would like to encourage since we > > > prefer to be explicit whenever possible. If > > > users write pipelines with this sort of > pattern, > > > they will potentially encounter pain when > > > upgrading to a later version since this is an > > > implementation detail and not an officially > > > supported pattern. I agree with the comments > > > above that this is ultimately a scoping issue. > > > I would not have a problem with these changes > if > > > they were explicitly scoped under either a > > > runner or unparsed options namespace. > > > > > > As a second note, since the 2.8.0 release is > > > being cut right now, because of these > > > backwards-compatibility concerns, I would > > > suggest reverting these changes, at least until > > > 2.8.0 is cut, so we can have a discussion here > > > before committing to and releasing any > API-level > > > changes. > > > > > > > > > +1 I would like to revert the changes in order not > > > rush this into the release. Once this discussion > > > results in an agreement changes can be brought > back. > > > > > > > > > On Fri, Oct 12, 2018 at 9:26 AM Henning Rohde > > > <hero...@google.com <mailto:hero...@google.com > >> > > > wrote: > > > > > > Agree that pipeline options lack some > > > mechanism for scoping. It is also not > always > > > possible distinguish options meant to be > > > consumed at pipeline construction time, by > > > the runner, by the SDK harness, by the user > > > code or any combination -- and this causes > > > confusion every now and then. > > > > > > For Dataflow, we have been using > > > "experiments" for arbitrary runner-specific > > > options. It's simply a string list pipeline > > > option that all SDKs support and, for Go at > > > least, is sent to portable runners. Flink > > > can do the same in the short term to move > > > forward. > > > > > > Henning > > > > > > > > > On Fri, Oct 12, 2018 at 8:50 AM Thomas > Weise > > > <t...@apache.org <mailto:t...@apache.org>> > wrote: > > > > > > [moving to the list] > > > > > > The requirement driving this part of > the > > > change was to allow a user to specify > > > pipeline options that a runner supports > > > without having to declare those in each > > > language SDK. > > > > > > In the specific scenario, we have > > > options that the Flink runner supports > > > (and can validate), that are not > > > enumerated in the Python SDK. > > > > > > I think we have a bigger problem > scoping > > > pipeline options. For example, the > > > runner options are dumped into the SDK > > > worker. There is also a possibility of > > > name collisions. So I think this would > > > benefit from broader feedback. > > > > > > Thanks, > > > Thomas > > > > > > > > > ---------- Forwarded message --------- > > > From: *Charles Chen* > > > <notificati...@github.com > > > <mailto:notificati...@github.com>> > > > Date: Fri, Oct 12, 2018 at 8:36 AM > > > Subject: Re: [apache/beam] [BEAM-5442] > > > Store duplicate unknown options in a > > > list argument (#6600) > > > To: apache/beam < > b...@noreply.github.com > > > <mailto:b...@noreply.github.com>> > > > Cc: Thomas Weise < > thomas.we...@gmail.com > > > <mailto:thomas.we...@gmail.com>>, > > > Mention <ment...@noreply.github.com > > > <mailto:ment...@noreply.github.com>> > > > > > > > > > CC: @tweise <https://github.com/tweise > > > > > > > > — > > > You are receiving this because you were > > > mentioned. > > > Reply to this email directly, view it > on > > > GitHub > > > < > https://github.com/apache/beam/pull/6600#issuecomment-429367754>, > > > or mute the thread > > > < > https://github.com/notifications/unsubscribe-auth/AAQGDwwt15R85eq9pySUisyxq2HYz-Vyks5ukLcLgaJpZM4XMo-T > >. > > > > > > > > > >
smime.p7s
Description: S/MIME Cryptographic Signature