Hi,
I'm looking into updating the Flink Runner to Flink version 1.8. Since
version 1.7 Flink has a new optional interface for Coder evolution*.
When a Flink pipeline is checkpointed, CoderSnapshots are written out
alongside with the checkpointed data. When the pipeline is restored from
that checkpoint, the CoderSnapshots are restored and used to
reinstantiate the Coders.
Furthermore, there is a compatibility and migration check between the
old and the new Coder. This allows to determine whether
- The serializer did not change or is compatible (ok)
- The serialization format of the coder changed (ok after migration)
- The coder needs to be reconfigured and we know how to that based on
the old version (ok after reconfiguration)
- The coder is incompatible (error)
I was wondering about the Coder evolution story in Beam. The current
state is that checkpointed Beam pipelines are only guaranteed to run
with the same Beam version and pipeline version. A newer version of
either might break the checkpoint format without any way to migrate the
state.
Should we start thinking about supporting Coder evolution in Beam?
Thanks,
Max
* Coders are called TypeSerializers in Flink land. The interface is
TypeSerializerSnapshot.