This is not quite true - people often update Flink pipelines as well. On Fri, Nov 2, 2018 at 10:50 AM Lukasz Cwik <lc...@google.com> wrote:
> +Reuven Lax <re...@google.com> for update proposal > > Dataflow is the only Apache Beam runner which has the capability of > updating pipelines. This page[1] describes many of the aspects of how it > works and specifically talks about coder changes: > > > - *Changing the Coder for a step.* When you update a job, the Cloud > Dataflow service preserves any data records currently buffered (for > example, while windowing > <https://beam.apache.org/documentation/programming-guide/#windowing> is > resolving) and handles them in the replacement job. If the replacement job > uses different or incompatible data encoding > > <https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety>, > the Cloud Dataflow service will not be able to serialize or deserialize > these records. > > *Caution:* The Cloud Dataflow service currently cannot guarantee that > changing a coder in your prior pipeline to an incompatible coder will cause > the compatibility check to fail. It is recommended that you *do not* > attempt > to make backwards-incompatible changes to Coders when updating your > pipeline; if your pipeline update succeeds but you encounter issues or > errors in the resulting data, ensure that your replacement pipeline uses > data encoding that's the same as, or at least compatible with, your prior > job. > > There has been a proposal[2] for general update support within Apache Beam > with little traction for implementation outside of Dataflow. > > Looking at your code, it wouldn't work with update because encoded values > concatenated together without an element delimiter in many situations. > Hence when you decode a value using the past format with your new coder you > would read from the next value corrupting your read. If you really need to > change the encoding in a backwards incompatible way, you would need to > change the "name" of the coder which currently defaults to the class name. > > 1: https://cloud.google.com/dataflow/pipelines/updating-a-pipeline > 2: http://doc/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY > > > On Fri, Nov 2, 2018 at 5:44 AM Jeff Klukas <jklu...@mozilla.com> wrote: > >> I'm adding a new lastModifiedMillis field to MatchResult.Metadata [0] >> which requires also updating MetadataCoder, but it's not clear to me >> whether there are guidelines to follow when evolving a type when that >> changes the encoding. >> >> Is a user allowed to update Beam library versions as part of updating a >> pipeline? If so, there could be a situation where an updated pipeline is >> reading state that includes Metadata encoded without the new >> lastModifiedMillis field, which would cause a CodingException to be thrown. >> >> Is there prior art for evolving a type and its Coder? Should I be >> defensive and catch CodingException when attempting to decode the new >> field, providing a default value? >> >> [0] https://github.com/apache/beam/pull/6914 >> >