Alright, then it is clearer. Thank you for your answers! On Thu, Oct 26, 2023, 20:36 Robert Bradshaw <rober...@google.com> wrote:
> On Thu, Oct 26, 2023 at 3:59 AM Johanna Öjeling <joha...@ojeling.net> > wrote: > > > > Hi, > > > > I like this idea of making it easier to push out improvements, and had a > look at the PR. > > > > One question to better understand how it works today: > > > > The upgrades that the runners do, such as those not visible to the user, > can they be initiated at any time or do they only happen in relation to > that the user updates the running pipeline e.g. with new user code? > > Correct. We're talking about user-initiated changes to their pipeline here. > > > And, assuming the former, some reflections that came to mind when > reviewing the changes: > > > > Will the update_compatibility_version option be effective both when > creating and updating a pipeline? It is grouped with the update options in > the Python SDK, but users may want to configure the compatibility already > when launching the pipeline. > > It will be effective for both, though generally there's little > motivation to not always use the "latest" version when creating a new > pipeline. > > > Would it be possible to revert setting a fixed prior version, i.e. > (re-)enable upgrades? > > The contract would be IF you start with version X (which logically > defaults to the current SDK), THEN all updates also setting this to > version X (even on SDKs > X) should work. > > > If yes: in practice, would this motivate another option, or passing a > value like "auto" or "latest" to update_compatibility_version? > > Unset is interpreted as latest. Auto is hard, because it would involve > querying the runner before pipeline construction, and we may not even > know what the runner is at this point. (Eventually we could do things > like embed both alternative into the graph and let the runner choose, > but this is more speculative and may not be as scalable.) > > > The option is being introduced to the Java and Python SDKs. Should this > also be applicable to the Go SDK? > > Yes, allowing setting this value should be done for Go (and > typescript, and future SDKs) too. As Robert Burke mentioned, we need > to respect the value in those SDKs that have expansion service > implementations first. > > > On Thu, Oct 26, 2023 at 2:25 AM Robert Bradshaw via dev < > dev@beam.apache.org> wrote: > >> > >> Dataflow (among other runners) has the ability to "upgrade" running > >> pipelines with new code (e.g. capturing bug fixes, dependency updates, > >> and limited topology changes). Unfortunately some improvements (e.g. > >> new and improved ways of writing to BigQuery, optimized use of side > >> inputs, a change in algorithm, sometimes completely internally and not > >> visible to the user) are not sufficiently backwards compatible which > >> causes us, with the motivation to not break users, to either not make > >> these changes or guard them as a parallel opt-in mode which is a > >> significant drain on both developer productivity and causes new > >> pipelines to run in obsolete modes by default. > >> > >> I created https://github.com/apache/beam/pull/29140 which adds a new > >> pipeline option, update_compatibility_version, that allows the SDK to > >> move forward while letting users with pipelines launched previously to > >> manually request the "old" way of doing things to preserve update > >> compatibility. (We should still attempt backwards compatibility when > >> it makes sense, and the old way would remain in code until such a time > >> it's actually deprecated and removed, but this means we won't be > >> constrained by it, especially when it comes to default settings.) > >> > >> Any objections or other thoughts on this approach? > >> > >> - Robert > >> > >> P.S. Separately I think it'd be valuable to elevate the vague notion > >> of update compatibility to a first-class Beam concept and put it on > >> firm footing, but that's a larger conversation outside the thread of > >> this smaller (and I think still useful in such a future world) change. >