Getting back to this, I think Luke has outlined a good implementation
strategy. I have not followed progress on getting this documented durably
and voted on. Maybe gdoc draft to vote on and then web site since it should
be *very* stable and also forms the new core of what Beam "is" so it should
be clear to explain the concepts at a high level, with good PR review of
any changes to the protocol and documentation.

Kenn

On Fri, Jun 12, 2020 at 1:14 PM Udi Meiri <[email protected]> wrote:

> I'm not very familiar with this effort.
> Were there ITs / POCs created for these changes? (to surface any obvious
> bugs)
> Are these changes usable in DirectRunner?
>
>
> On Fri, Jun 12, 2020 at 8:50 AM Luke Cwik <[email protected]> wrote:
>
>> A few months back there was a discussion[1] about performing work to
>> stabilize the protos used for pipeline execution looking forward to cross
>> language pipelines and runners who want to use them across SDK versions
>> (Dataflow).
>>
>> All the proposed incompatible clean-up tasks were done and made it into
>> 2.21 (there are some left related to documentation and cleaning up some
>> stuff that can be removed in a backwards compatible way and general
>> re-organization within the files to delineate what is stable and what
>> isn't).
>>
>> Beyond documenting the versioning story (sketch below) in a more durable
>> location then this ML, performing these last clean-up tasks and general
>> re-organization within the files, is there anything else that should be
>> done before we can vote and consider the protos to be stable (which would
>> mean that 2.21 would contain the first stable version assuming no other
>> incompatible changes are suggested)?
>>
>> The versioning story is around 3 parts and effectively occurs whenever
>> there is an incompatible change such as:
>> * adding a new field that didn't exist where it semantically changes what
>> is to be done
>> * removing a field that was effectively required
>> * requiring an SDK or runner to behave differently (e.g. support large
>> iterables, support a new API (such as a future map state for StatefulDoFns))
>> The three ways of handling versioning for incompatible changes are:
>> * many protos have URNs, when there is an incompatible change the URN
>> should be changed. If it is effectively the same thing then this should
>> lead to a version bump and update of the documentation reflecting what the
>> requirements of the new version are.
>> * there is a capabilities section on each environment, this should
>> enumerate everything the SDK can support, protocols (e.g. large iterables,
>> ...), coders, well known transforms, ...
>> * there is a requirements section on the pipeline proto, this is an
>> enumeration of everything the SDK needs the runner to know to be able to
>> interpret the pipeline (e.g. splittable dofn, requires time sorted input,
>> ...).
>>
>> Updating the URN of the transform/coder is typically the easiest way to
>> handle incompatible changes followed by using the capabilities list to
>> enable new things (used like an allowlist) and the requirements list to
>> prevent runners from doing things they shouldn't (used like a denylist).
>> Many features/APIs that are part of the initial version are implicitly not
>> in either the capabilities or requirements lists to prevent a huge
>> definition list and can be disabled in the future by relying on adding
>> requirements that disable these currently unnamed features/APIs if it is
>> ever necessary.
>>
>> 1:
>> https://lists.apache.org/thread.html/rdf247cfa3a509f80578f03b2454ea1e50474ee3576a059486d58fdf4%40%3Cdev.beam.apache.org%3E
>>
>

Reply via email to