Re: Evolving a Coder for an added field

Lukasz Cwik Fri, 02 Nov 2018 12:36:56 -0700

Can you give more insight into how the Flink update works because I was
only aware of the Dataflow one?



On Fri, Nov 2, 2018 at 12:35 PM Reuven Lax <re...@google.com> wrote:

> This is not quite true - people often update Flink pipelines as well.
>
> On Fri, Nov 2, 2018 at 10:50 AM Lukasz Cwik <lc...@google.com> wrote:
>
>> +Reuven Lax <re...@google.com> for update proposal
>>
>> Dataflow is the only Apache Beam runner which has the capability of
>> updating pipelines. This page[1] describes many of the aspects of how it
>> works and specifically talks about coder changes:
>>
>>
>>    - *Changing the Coder for a step.* When you update a job, the Cloud
>>    Dataflow service preserves any data records currently buffered (for
>>    example, while windowing
>>    <https://beam.apache.org/documentation/programming-guide/#windowing> is
>>    resolving) and handles them in the replacement job. If the replacement job
>>    uses different or incompatible data encoding
>>    
>> <https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety>,
>>    the Cloud Dataflow service will not be able to serialize or deserialize
>>    these records.
>>
>>    *Caution:* The Cloud Dataflow service currently cannot guarantee that
>>    changing a coder in your prior pipeline to an incompatible coder will 
>> cause
>>    the compatibility check to fail. It is recommended that you *do not* 
>> attempt
>>    to make backwards-incompatible changes to Coders when updating your
>>    pipeline; if your pipeline update succeeds but you encounter issues or
>>    errors in the resulting data, ensure that your replacement pipeline uses
>>    data encoding that's the same as, or at least compatible with, your prior
>>    job.
>>
>> There has been a proposal[2] for general update support within Apache
>> Beam with little traction for implementation outside of Dataflow.
>>
>> Looking at your code, it wouldn't work with update because encoded values
>> concatenated together without an element delimiter in many situations.
>> Hence when you decode a value using the past format with your new coder you
>> would read from the next value corrupting your read. If you really need to
>> change the encoding in a backwards incompatible way, you would need to
>> change the "name" of the coder which currently defaults to the class name.
>>
>> 1: https://cloud.google.com/dataflow/pipelines/updating-a-pipeline
>> 2: http://doc/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY
>>
>>
>> On Fri, Nov 2, 2018 at 5:44 AM Jeff Klukas <jklu...@mozilla.com> wrote:
>>
>>> I'm adding a new lastModifiedMillis field to MatchResult.Metadata [0]
>>> which requires also updating MetadataCoder, but it's not clear to me
>>> whether there are guidelines to follow when evolving a type when that
>>> changes the encoding.
>>>
>>> Is a user allowed to update Beam library versions as part of updating a
>>> pipeline? If so, there could be a situation where an updated pipeline is
>>> reading state that includes Metadata encoded without the new
>>> lastModifiedMillis field, which would cause a CodingException to be thrown.
>>>
>>> Is there prior art for evolving a type and its Coder? Should I be
>>> defensive and catch CodingException when attempting to decode the new
>>> field, providing a default value?
>>>
>>> [0] https://github.com/apache/beam/pull/6914
>>>
>>

Re: Evolving a Coder for an added field

Reply via email to