Re: Evolving a Coder for an added field

Reuven Lax Fri, 02 Nov 2018 12:35:21 -0700

This is not quite true - people often update Flink pipelines as well.

On Fri, Nov 2, 2018 at 10:50 AM Lukasz Cwik <lc...@google.com> wrote:


> +Reuven Lax <re...@google.com> for update proposal
>
> Dataflow is the only Apache Beam runner which has the capability of
> updating pipelines. This page[1] describes many of the aspects of how it
> works and specifically talks about coder changes:
>
>
>    - *Changing the Coder for a step.* When you update a job, the Cloud
>    Dataflow service preserves any data records currently buffered (for
>    example, while windowing
>    <https://beam.apache.org/documentation/programming-guide/#windowing> is
>    resolving) and handles them in the replacement job. If the replacement job
>    uses different or incompatible data encoding
>    
> <https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety>,
>    the Cloud Dataflow service will not be able to serialize or deserialize
>    these records.
>
>    *Caution:* The Cloud Dataflow service currently cannot guarantee that
>    changing a coder in your prior pipeline to an incompatible coder will cause
>    the compatibility check to fail. It is recommended that you *do not* 
> attempt
>    to make backwards-incompatible changes to Coders when updating your
>    pipeline; if your pipeline update succeeds but you encounter issues or
>    errors in the resulting data, ensure that your replacement pipeline uses
>    data encoding that's the same as, or at least compatible with, your prior
>    job.
>
> There has been a proposal[2] for general update support within Apache Beam
> with little traction for implementation outside of Dataflow.
>
> Looking at your code, it wouldn't work with update because encoded values
> concatenated together without an element delimiter in many situations.
> Hence when you decode a value using the past format with your new coder you
> would read from the next value corrupting your read. If you really need to
> change the encoding in a backwards incompatible way, you would need to
> change the "name" of the coder which currently defaults to the class name.
>
> 1: https://cloud.google.com/dataflow/pipelines/updating-a-pipeline
> 2: http://doc/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY
>
>
> On Fri, Nov 2, 2018 at 5:44 AM Jeff Klukas <jklu...@mozilla.com> wrote:
>
>> I'm adding a new lastModifiedMillis field to MatchResult.Metadata [0]
>> which requires also updating MetadataCoder, but it's not clear to me
>> whether there are guidelines to follow when evolving a type when that
>> changes the encoding.
>>
>> Is a user allowed to update Beam library versions as part of updating a
>> pipeline? If so, there could be a situation where an updated pipeline is
>> reading state that includes Metadata encoded without the new
>> lastModifiedMillis field, which would cause a CodingException to be thrown.
>>
>> Is there prior art for evolving a type and its Coder? Should I be
>> defensive and catch CodingException when attempting to decode the new
>> field, providing a default value?
>>
>> [0] https://github.com/apache/beam/pull/6914
>>
>

Re: Evolving a Coder for an added field

Reply via email to