Re: Streaming update compatibility

Johanna Öjeling via dev Thu, 26 Oct 2023 20:09:50 -0700

Alright, then it is clearer. Thank you for your answers!

On Thu, Oct 26, 2023, 20:36 Robert Bradshaw <rober...@google.com> wrote:


> On Thu, Oct 26, 2023 at 3:59 AM Johanna Öjeling <joha...@ojeling.net>
> wrote:
> >
> > Hi,
> >
> > I like this idea of making it easier to push out improvements, and had a
> look at the PR.
> >
> > One question to better understand how it works today:
> >
> > The upgrades that the runners do, such as those not visible to the user,
> can they be initiated at any time or do they only happen in relation to
> that the user updates the running pipeline e.g. with new user code?
>
> Correct. We're talking about user-initiated changes to their pipeline here.
>
> > And, assuming the former, some reflections that came to mind when
> reviewing the changes:
> >
> > Will the update_compatibility_version option be effective both when
> creating and updating a pipeline? It is grouped with the update options in
> the Python SDK, but users may want to configure the compatibility already
> when launching the pipeline.
>
> It will be effective for both, though generally there's little
> motivation to not always use the "latest" version when creating a new
> pipeline.
>
> > Would it be possible to revert setting a fixed prior version, i.e.
> (re-)enable upgrades?
>
> The contract would be IF you start with version X (which logically
> defaults to the current SDK), THEN all updates also setting this to
> version X (even on SDKs > X) should work.
>
> > If yes: in practice, would this motivate another option, or passing a
> value like "auto" or "latest" to update_compatibility_version?
>
> Unset is interpreted as latest. Auto is hard, because it would involve
> querying the runner before pipeline construction, and we may not even
> know what the runner is at this point. (Eventually we could do things
> like embed both alternative into the graph and let the runner choose,
> but this is more speculative and may not be as scalable.)
>
> > The option is being introduced to the Java and Python SDKs. Should this
> also be applicable to the Go SDK?
>
> Yes, allowing setting this value should be done for Go (and
> typescript, and future SDKs) too. As Robert Burke mentioned, we need
> to respect the value in those SDKs that have expansion service
> implementations first.
>
> > On Thu, Oct 26, 2023 at 2:25 AM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
> >>
> >> Dataflow (among other runners) has the ability to "upgrade" running
> >> pipelines with new code (e.g. capturing bug fixes, dependency updates,
> >> and limited topology changes). Unfortunately some improvements (e.g.
> >> new and improved ways of writing to BigQuery, optimized use of side
> >> inputs, a change in algorithm, sometimes completely internally and not
> >> visible to the user) are not sufficiently backwards compatible which
> >> causes us, with the motivation to not break users, to either not make
> >> these changes or guard them as a parallel opt-in mode which is a
> >> significant drain on both developer productivity and causes new
> >> pipelines to run in obsolete modes by default.
> >>
> >> I created https://github.com/apache/beam/pull/29140 which adds a new
> >> pipeline option, update_compatibility_version, that allows the SDK to
> >> move forward while letting users with pipelines launched previously to
> >> manually request the "old" way of doing things to preserve update
> >> compatibility. (We should still attempt backwards compatibility when
> >> it makes sense, and the old way would remain in code until such a time
> >> it's actually deprecated and removed, but this means we won't be
> >> constrained by it, especially when it comes to default settings.)
> >>
> >> Any objections or other thoughts on this approach?
> >>
> >> - Robert
> >>
> >> P.S. Separately I think it'd be valuable to elevate the vague notion
> >> of update compatibility to a first-class Beam concept and put it on
> >> firm footing, but that's a larger conversation outside the thread of
> >> this smaller (and I think still useful in such a future world) change.
>

Re: Streaming update compatibility

Reply via email to