Can you give more insight into how the Flink update works because I was only aware of the Dataflow one?
On Fri, Nov 2, 2018 at 12:35 PM Reuven Lax <re...@google.com> wrote: > This is not quite true - people often update Flink pipelines as well. > > On Fri, Nov 2, 2018 at 10:50 AM Lukasz Cwik <lc...@google.com> wrote: > >> +Reuven Lax <re...@google.com> for update proposal >> >> Dataflow is the only Apache Beam runner which has the capability of >> updating pipelines. This page[1] describes many of the aspects of how it >> works and specifically talks about coder changes: >> >> >> - *Changing the Coder for a step.* When you update a job, the Cloud >> Dataflow service preserves any data records currently buffered (for >> example, while windowing >> <https://beam.apache.org/documentation/programming-guide/#windowing> is >> resolving) and handles them in the replacement job. If the replacement job >> uses different or incompatible data encoding >> >> <https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety>, >> the Cloud Dataflow service will not be able to serialize or deserialize >> these records. >> >> *Caution:* The Cloud Dataflow service currently cannot guarantee that >> changing a coder in your prior pipeline to an incompatible coder will >> cause >> the compatibility check to fail. It is recommended that you *do not* >> attempt >> to make backwards-incompatible changes to Coders when updating your >> pipeline; if your pipeline update succeeds but you encounter issues or >> errors in the resulting data, ensure that your replacement pipeline uses >> data encoding that's the same as, or at least compatible with, your prior >> job. >> >> There has been a proposal[2] for general update support within Apache >> Beam with little traction for implementation outside of Dataflow. >> >> Looking at your code, it wouldn't work with update because encoded values >> concatenated together without an element delimiter in many situations. >> Hence when you decode a value using the past format with your new coder you >> would read from the next value corrupting your read. If you really need to >> change the encoding in a backwards incompatible way, you would need to >> change the "name" of the coder which currently defaults to the class name. >> >> 1: https://cloud.google.com/dataflow/pipelines/updating-a-pipeline >> 2: http://doc/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY >> >> >> On Fri, Nov 2, 2018 at 5:44 AM Jeff Klukas <jklu...@mozilla.com> wrote: >> >>> I'm adding a new lastModifiedMillis field to MatchResult.Metadata [0] >>> which requires also updating MetadataCoder, but it's not clear to me >>> whether there are guidelines to follow when evolving a type when that >>> changes the encoding. >>> >>> Is a user allowed to update Beam library versions as part of updating a >>> pipeline? If so, there could be a situation where an updated pipeline is >>> reading state that includes Metadata encoded without the new >>> lastModifiedMillis field, which would cause a CodingException to be thrown. >>> >>> Is there prior art for evolving a type and its Coder? Should I be >>> defensive and catch CodingException when attempting to decode the new >>> field, providing a default value? >>> >>> [0] https://github.com/apache/beam/pull/6914 >>> >>