JingsongLi commented on issue #1438: URL: https://github.com/apache/iceberg/issues/1438#issuecomment-693144323
## State compatibility Let's focus @rdblue 's concern first: where Iceberg is upgraded and can no longer read older state. Currently, as @stevenzwu said, only the `DataFile` is in Flink state. And it is serialized by Kryo. Although Kryo can avoid `serialVersionUID`, adding or reducing fields will also cause compatibility problems. Here I suggest that we need to implement Flink serializer for `DataFile`. https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/custom_serialization.html#state-serializers-and-schema-evolution When restoring from savepoints, Flink allows changing the serializers used to read and write previously registered state, so that users are not locked in to any specific serialization schema. ## Job Topology compatibility This means we want to upgrade an existing Flink Job to a new Iceberg version. As @stevenzwu said, premise is that we have different versions of binaries for a Flink cluster. I don't think this is very important, but, if we want to solve it, we can define custom serializer for all the Iceberg classes as the final members for `RowDataTaskWriterFactory`. But it's going to be a little cumbersome. @stevenzwu ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
