+1 for explicit --runner_option=param=val,...
It's hard to tell otherwise where an option is going to,

On Mon, Oct 15, 2018 at 8:04 AM Robert Bradshaw <rober...@google.com> wrote:

> On Mon, Oct 15, 2018 at 3:58 PM Maximilian Michels <m...@apache.org> wrote:
> >
> > I agree that the current approach breaks the pipeline options contract
> > because "unknown" options get parsed in the same way as options which
> > have been defined by the user.
>
> FWIW, I think we're already breaking this "contract." Unknown options
> are silently ignored; with this change we just change how we record
> them. It still feels a bit hacky though.
>
> > I'm not sure the `experiments` flag works for us. AFAIK it only allows
> > true/false flags. We want to pass all types of pipeline options to the
> > Runner.
>
> Experiments is an arbitrary set of strings, which can be of the form
> "param=value" if that's useful. (Dataflow does this.) There is, again,
> no namespacing on the param names, but we could user urns or impose
> some other structure here.
>
> > How to solve this?
> >
> > 1) Add all options of all Runners to each SDK
> > We added some of the FlinkRunner options to the Python SDK but realized
> > syncing is rather cumbersome in the long term. However, we want the most
> > important options to be validated on the client side.
>
> I don't think this is sustainable in the long run. However, thinking
> about this, in the worse case validation happens after construction
> but before execution (as with much of our other validation) so it
> isn't that bad.
>
> > 2) Pass "unknown" options via a separate list in the Proto which can
> > only be accessed internally by the Runners. This still allows passing
> > arbitrary options but we wouldn't leak unknown options and display them
> > as top-level options.
>
> I think there needs to be a way for the user to communicate values
> directly to the runner regardless of the SDK. My preference would be
> to make this explicit, e.g. (repeated) --runner_option=..., rather
> than scooping up all unknown flags at command line parsing time.
> Perhaps an SDK that is aware of some runners could choose to lift
> these as top-level options, but still pass them as runner options.
>
> > On 13.10.18 02:34, Charles Chen wrote:
> > > The current release branch
> > > (https://github.com/apache/beam/commits/release-2.8.0) was cut after
> the
> > > revert went in.  Sent out https://github.com/apache/beam/pull/6683 as
> a
> > > revert of the revert.  Regarding your comment above, I can help out
> with
> > > the design / PR reviews for common Python code as you suggest.
> > >
> > > On Fri, Oct 12, 2018 at 4:48 PM Thomas Weise <t...@apache.org
> > > <mailto:t...@apache.org>> wrote:
> > >
> > >     Thanks, will tag you and looking forward to feedback so we can
> > >     ensure that changes work for everyone.
> > >
> > >     Looking at the PR, I see agreement from Max to revert the change on
> > >     the release branch, but not in master. Would you mind to restore it
> > >     in master?
> > >
> > >     Thanks
> > >
> > >     On Fri, Oct 12, 2018 at 4:40 PM Ahmet Altay <al...@google.com
> > >     <mailto:al...@google.com>> wrote:
> > >
> > >
> > >
> > >         On Fri, Oct 12, 2018 at 11:31 AM, Charles Chen <c...@google.com
> > >         <mailto:c...@google.com>> wrote:
> > >
> > >             What I mean is that a user may find that it works for them
> > >             to pass "--myarg blah" and access it as "options.myarg"
> > >             without explicitly defining a "my_arg" flag due to the
> added
> > >             logic.  This is not the intended behavior and we may want
> to
> > >             change this implementation detail in the future.  However,
> > >             having this logic in a released version makes it hard to
> > >             change this behavior since users may erroneously depend on
> > >             this undocumented behavior.  Instead, we should namespace /
> > >             scope this so that it is obvious that this is meant for
> > >             runner (and not Beam user) consumption.
> > >
> > >             On Fri, Oct 12, 2018 at 10:48 AM Thomas Weise
> > >             <t...@apache.org <mailto:t...@apache.org>> wrote:
> > >
> > >                 Can you please elaborate more what practical problems
> > >                 this introduces for users?
> > >
> > >                 I can see that this change allows a user to specify a
> > >                 runner specific option, which in the future may change
> > >                 because we decide to scope differently. If this only
> > >                 affects users of the portable Flink runner (like us),
> > >                 then no need to revert, because at this early stage we
> > >                 prefer something that works over being blocked.
> > >
> > >                 It would also be really great if some of the core
> Python
> > >                 SDK developers could help out with the design aspects
> > >                 and PR reviews of changes that affect common Python
> > >                 code. Anyone who specifically wants to be tagged on
> > >                 relevant JIRAs and PRs?
> > >
> > >
> > >         I would be happy to be tagged, and I can also help with
> > >         including other relevant folks whenever possible. In general I
> > >         think Robert, Charles, myself are good candidates.
> > >
> > >
> > >                 Thanks
> > >
> > >
> > >                 On Fri, Oct 12, 2018 at 10:20 AM Ahmet Altay
> > >                 <al...@google.com <mailto:al...@google.com>> wrote:
> > >
> > >
> > >
> > >                     On Fri, Oct 12, 2018 at 10:11 AM, Charles Chen
> > >                     <c...@google.com <mailto:c...@google.com>> wrote:
> > >
> > >                         For context, I made comments on
> > >                         https://github.com/apache/beam/pull/6600
> noting
> > >                         that the changes being made were not good for
> > >                         Beam backwards-compatibility.  The change as is
> > >                         allows users to use pipeline options without
> > >                         explicitly defining them, which is not the type
> > >                         of usage we would like to encourage since we
> > >                         prefer to be explicit whenever possible.  If
> > >                         users write pipelines with this sort of
> pattern,
> > >                         they will potentially encounter pain when
> > >                         upgrading to a later version since this is an
> > >                         implementation detail and not an officially
> > >                         supported pattern.  I agree with the comments
> > >                         above that this is ultimately a scoping issue.
> > >                         I would not have a problem with these changes
> if
> > >                         they were explicitly scoped under either a
> > >                         runner or unparsed options namespace.
> > >
> > >                         As a second note, since the 2.8.0 release is
> > >                         being cut right now, because of these
> > >                         backwards-compatibility concerns, I would
> > >                         suggest reverting these changes, at least until
> > >                         2.8.0 is cut, so we can have a discussion here
> > >                         before committing to and releasing any
> API-level
> > >                         changes.
> > >
> > >
> > >                     +1 I would like to revert the changes in order not
> > >                     rush this into the release. Once this discussion
> > >                     results in an agreement changes can be brought
> back.
> > >
> > >
> > >                         On Fri, Oct 12, 2018 at 9:26 AM Henning Rohde
> > >                         <hero...@google.com <mailto:hero...@google.com
> >>
> > >                         wrote:
> > >
> > >                             Agree that pipeline options lack some
> > >                             mechanism for scoping. It is also not
> always
> > >                             possible distinguish options meant to be
> > >                             consumed at pipeline construction time, by
> > >                             the runner, by the SDK harness, by the user
> > >                             code or any combination -- and this causes
> > >                             confusion every now and then.
> > >
> > >                             For Dataflow, we have been using
> > >                             "experiments" for arbitrary runner-specific
> > >                             options. It's simply a string list pipeline
> > >                             option that all SDKs support and, for Go at
> > >                             least, is sent to portable runners. Flink
> > >                             can do the same in the short term to move
> > >                             forward.
> > >
> > >                             Henning
> > >
> > >
> > >                             On Fri, Oct 12, 2018 at 8:50 AM Thomas
> Weise
> > >                             <t...@apache.org <mailto:t...@apache.org>>
> wrote:
> > >
> > >                                 [moving to the list]
> > >
> > >                                 The requirement driving this part of
> the
> > >                                 change was to allow a user to specify
> > >                                 pipeline options that a runner supports
> > >                                 without having to declare those in each
> > >                                 language SDK.
> > >
> > >                                 In the specific scenario, we have
> > >                                 options that the Flink runner supports
> > >                                 (and can validate), that are not
> > >                                 enumerated in the Python SDK.
> > >
> > >                                 I think we have a bigger problem
> scoping
> > >                                 pipeline options. For example, the
> > >                                 runner options are dumped into the SDK
> > >                                 worker. There is also a possibility of
> > >                                 name collisions. So I think this would
> > >                                 benefit from broader feedback.
> > >
> > >                                 Thanks,
> > >                                 Thomas
> > >
> > >
> > >                                 ---------- Forwarded message ---------
> > >                                 From: *Charles Chen*
> > >                                 <notificati...@github.com
> > >                                 <mailto:notificati...@github.com>>
> > >                                 Date: Fri, Oct 12, 2018 at 8:36 AM
> > >                                 Subject: Re: [apache/beam] [BEAM-5442]
> > >                                 Store duplicate unknown options in a
> > >                                 list argument (#6600)
> > >                                 To: apache/beam <
> b...@noreply.github.com
> > >                                 <mailto:b...@noreply.github.com>>
> > >                                 Cc: Thomas Weise <
> thomas.we...@gmail.com
> > >                                 <mailto:thomas.we...@gmail.com>>,
> > >                                 Mention <ment...@noreply.github.com
> > >                                 <mailto:ment...@noreply.github.com>>
> > >
> > >
> > >                                 CC: @tweise <https://github.com/tweise
> >
> > >
> > >                                 —
> > >                                 You are receiving this because you were
> > >                                 mentioned.
> > >                                 Reply to this email directly, view it
> on
> > >                                 GitHub
> > >                                 <
> https://github.com/apache/beam/pull/6600#issuecomment-429367754>,
> > >                                 or mute the thread
> > >                                 <
> https://github.com/notifications/unsubscribe-auth/AAQGDwwt15R85eq9pySUisyxq2HYz-Vyks5ukLcLgaJpZM4XMo-T
> >.
> > >
> > >
> > >
>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to