@Yun Tang <myas...@live.com> As the author of the referenced PR, it would be great if you could help take a look here. Thanks.
@Jeffery: For your second question, with officially released Flink, once checkpoint has been completed successfully it's safe to restore from [1]. The issue you encountered probably was caused by some bug of the un-merged PR and let's wait for the author's answer. Hope this helps. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/internals/stream_checkpointing.html Best Regards, Yu On Thu, 14 Nov 2019 at 06:27, Jeffrey Martin <jeffrey.martin...@gmail.com> wrote: > Hi all, > > I'm using protobufs as keys of a Flink stream using code copied from this > pull request <https://github.com/apache/flink/pull/7598>, but > deserialization is failing after checkpoint restore due to missing data. > > I'm using HDFS and the RocksDB backend. I tried providing the path to a > previous retained checkpoint (i.e., > .../flink-checkpoints/{jobIdHex}/chk-14/_metadata). The proto deserializer > failed on a serialized record that had been truncated and was missing its > last 20 bytes out of 77 total. > > The same serializers work fine if I don't try restoring from a checkpoint, > worked fine for a different job, are fairly well unit-tested, and mostly > just delegate to the protobuf serde code so I'm pretty certain my > serializer is not the issue. Which means I'm doing something else wrong. > > Questions: > 1. Have others encountered issues like this? > 2. How do I know when a checkpoint has been completed and is safe to > restore from? (Is checkpoint completion atomic?) > > Thanks, > > Jeff Martin >