What I mean is that a user may find that it works for them to pass "--myarg
blah" and access it as "options.myarg" without explicitly defining a
"my_arg" flag due to the added logic.  This is not the intended behavior
and we may want to change this implementation detail in the future.
However, having this logic in a released version makes it hard to change
this behavior since users may erroneously depend on this undocumented
behavior.  Instead, we should namespace / scope this so that it is obvious
that this is meant for runner (and not Beam user) consumption.

On Fri, Oct 12, 2018 at 10:48 AM Thomas Weise <t...@apache.org> wrote:

> Can you please elaborate more what practical problems this introduces for
> users?
>
> I can see that this change allows a user to specify a runner specific
> option, which in the future may change because we decide to scope
> differently. If this only affects users of the portable Flink runner (like
> us), then no need to revert, because at this early stage we prefer
> something that works over being blocked.
>
> It would also be really great if some of the core Python SDK developers
> could help out with the design aspects and PR reviews of changes that
> affect common Python code. Anyone who specifically wants to be tagged on
> relevant JIRAs and PRs?
>
> Thanks
>
>
> On Fri, Oct 12, 2018 at 10:20 AM Ahmet Altay <al...@google.com> wrote:
>
>>
>>
>> On Fri, Oct 12, 2018 at 10:11 AM, Charles Chen <c...@google.com> wrote:
>>
>>> For context, I made comments on https://github.com/apache/beam/pull/6600
>>> noting that the changes being made were not good for Beam
>>> backwards-compatibility.  The change as is allows users to use pipeline
>>> options without explicitly defining them, which is not the type of usage we
>>> would like to encourage since we prefer to be explicit whenever possible.
>>> If users write pipelines with this sort of pattern, they will potentially
>>> encounter pain when upgrading to a later version since this is an
>>> implementation detail and not an officially supported pattern.  I agree
>>> with the comments above that this is ultimately a scoping issue.  I would
>>> not have a problem with these changes if they were explicitly scoped under
>>> either a runner or unparsed options namespace.
>>>
>>> As a second note, since the 2.8.0 release is being cut right now,
>>> because of these backwards-compatibility concerns, I would suggest
>>> reverting these changes, at least until 2.8.0 is cut, so we can have a
>>> discussion here before committing to and releasing any API-level changes.
>>>
>>
>> +1 I would like to revert the changes in order not rush this into the
>> release. Once this discussion results in an agreement changes can be
>> brought back.
>>
>>
>>>
>>> On Fri, Oct 12, 2018 at 9:26 AM Henning Rohde <hero...@google.com>
>>> wrote:
>>>
>>>> Agree that pipeline options lack some mechanism for scoping. It is also
>>>> not always possible distinguish options meant to be consumed at pipeline
>>>> construction time, by the runner, by the SDK harness, by the user code or
>>>> any combination -- and this causes confusion every now and then.
>>>>
>>>> For Dataflow, we have been using "experiments" for arbitrary
>>>> runner-specific options. It's simply a string list pipeline option that all
>>>> SDKs support and, for Go at least, is sent to portable runners. Flink can
>>>> do the same in the short term to move forward.
>>>>
>>>> Henning
>>>>
>>>>
>>>> On Fri, Oct 12, 2018 at 8:50 AM Thomas Weise <t...@apache.org> wrote:
>>>>
>>>>> [moving to the list]
>>>>>
>>>>> The requirement driving this part of the change was to allow a user to
>>>>> specify pipeline options that a runner supports without having to declare
>>>>> those in each language SDK.
>>>>>
>>>>> In the specific scenario, we have options that the Flink runner
>>>>> supports (and can validate), that are not enumerated in the Python SDK.
>>>>>
>>>>> I think we have a bigger problem scoping pipeline options. For
>>>>> example, the runner options are dumped into the SDK worker. There is also 
>>>>> a
>>>>> possibility of name collisions. So I think this would benefit from broader
>>>>> feedback.
>>>>>
>>>>> Thanks,
>>>>> Thomas
>>>>>
>>>>>
>>>>> ---------- Forwarded message ---------
>>>>> From: Charles Chen <notificati...@github.com>
>>>>> Date: Fri, Oct 12, 2018 at 8:36 AM
>>>>> Subject: Re: [apache/beam] [BEAM-5442] Store duplicate unknown options
>>>>> in a list argument (#6600)
>>>>> To: apache/beam <b...@noreply.github.com>
>>>>> Cc: Thomas Weise <thomas.we...@gmail.com>, Mention <
>>>>> ment...@noreply.github.com>
>>>>>
>>>>>
>>>>> CC: @tweise <https://github.com/tweise>
>>>>>
>>>>> —
>>>>> You are receiving this because you were mentioned.
>>>>> Reply to this email directly, view it on GitHub
>>>>> <https://github.com/apache/beam/pull/6600#issuecomment-429367754>, or mute
>>>>> the thread
>>>>> <https://github.com/notifications/unsubscribe-auth/AAQGDwwt15R85eq9pySUisyxq2HYz-Vyks5ukLcLgaJpZM4XMo-T>
>>>>> .
>>>>>
>>>>
>>

Reply via email to