Dataflow (among other runners) has the ability to "upgrade" running
pipelines with new code (e.g. capturing bug fixes, dependency updates,
and limited topology changes). Unfortunately some improvements (e.g.
new and improved ways of writing to BigQuery, optimized use of side
inputs, a change in algorithm, sometimes completely internally and not
visible to the user) are not sufficiently backwards compatible which
causes us, with the motivation to not break users, to either not make
these changes or guard them as a parallel opt-in mode which is a
significant drain on both developer productivity and causes new
pipelines to run in obsolete modes by default.

I created https://github.com/apache/beam/pull/29140 which adds a new
pipeline option, update_compatibility_version, that allows the SDK to
move forward while letting users with pipelines launched previously to
manually request the "old" way of doing things to preserve update
compatibility. (We should still attempt backwards compatibility when
it makes sense, and the old way would remain in code until such a time
it's actually deprecated and removed, but this means we won't be
constrained by it, especially when it comes to default settings.)

Any objections or other thoughts on this approach?

- Robert

P.S. Separately I think it'd be valuable to elevate the vague notion
of update compatibility to a first-class Beam concept and put it on
firm footing, but that's a larger conversation outside the thread of
this smaller (and I think still useful in such a future world) change.

Reply via email to