Re: Evolving a Coder for an added field

Lukasz Cwik Fri, 02 Nov 2018 10:50:57 -0700

+Reuven Lax <re...@google.com> for update proposal

Dataflow is the only Apache Beam runner which has the capability of
updating pipelines. This page[1] describes many of the aspects of how it
works and specifically talks about coder changes:

   - *Changing the Coder for a step.* When you update a job, the Cloud
   Dataflow service preserves any data records currently buffered (for
   example, while windowing
   <https://beam.apache.org/documentation/programming-guide/#windowing> is
   resolving) and handles them in the replacement job. If the replacement job
   uses different or incompatible data encoding

<https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety>,
   the Cloud Dataflow service will not be able to serialize or deserialize
   these records.

   *Caution:* The Cloud Dataflow service currently cannot guarantee that
   changing a coder in your prior pipeline to an incompatible coder will cause
   the compatibility check to fail. It is recommended that you *do not* attempt
   to make backwards-incompatible changes to Coders when updating your
   pipeline; if your pipeline update succeeds but you encounter issues or
   errors in the resulting data, ensure that your replacement pipeline uses
   data encoding that's the same as, or at least compatible with, your prior
   job.

There has been a proposal[2] for general update support within Apache Beam
with little traction for implementation outside of Dataflow.

Looking at your code, it wouldn't work with update because encoded values
concatenated together without an element delimiter in many situations.
Hence when you decode a value using the past format with your new coder you
would read from the next value corrupting your read. If you really need to
change the encoding in a backwards incompatible way, you would need to
change the "name" of the coder which currently defaults to the class name.

1: https://cloud.google.com/dataflow/pipelines/updating-a-pipeline
2: http://doc/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY

On Fri, Nov 2, 2018 at 5:44 AM Jeff Klukas <jklu...@mozilla.com> wrote:

> I'm adding a new lastModifiedMillis field to MatchResult.Metadata [0]
> which requires also updating MetadataCoder, but it's not clear to me
> whether there are guidelines to follow when evolving a type when that
> changes the encoding.
>
> Is a user allowed to update Beam library versions as part of updating a
> pipeline? If so, there could be a situation where an updated pipeline is
> reading state that includes Metadata encoded without the new
> lastModifiedMillis field, which would cause a CodingException to be thrown.
>
> Is there prior art for evolving a type and its Coder? Should I be
> defensive and catch CodingException when attempting to decode the new
> field, providing a default value?
>
> [0] https://github.com/apache/beam/pull/6914
>

Re: Evolving a Coder for an added field

Reply via email to