Re: flink checkpoint state data corruption

Yu Li Thu, 14 Nov 2019 09:30:15 -0800

@Yun Tang <myas...@live.com> As the author of the referenced PR, it would
be great if you could help take a look here. Thanks.


@Jeffery:
For your second question, with officially released Flink, once checkpoint
has been completed successfully it's safe to restore from [1]. The issue
you encountered probably was caused by some bug of the un-merged PR and
let's wait for the author's answer.

Hope this helps.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/internals/stream_checkpointing.html

Best Regards,
Yu


On Thu, 14 Nov 2019 at 06:27, Jeffrey Martin <jeffrey.martin...@gmail.com>
wrote:

> Hi all,
>
> I'm using protobufs as keys of a Flink stream using code copied from this
> pull request <https://github.com/apache/flink/pull/7598>, but
> deserialization is failing after checkpoint restore due to missing data.
>
> I'm using HDFS and the RocksDB backend. I tried providing the path to a
> previous retained checkpoint (i.e.,
> .../flink-checkpoints/{jobIdHex}/chk-14/_metadata). The proto deserializer
> failed on a serialized record that had been truncated and was missing its
> last 20 bytes out of 77 total.
>
> The same serializers work fine if I don't try restoring from a checkpoint,
> worked fine for a different job, are fairly well unit-tested, and mostly
> just delegate to the protobuf serde code so I'm pretty certain my
> serializer is not the issue. Which means I'm doing something else wrong.
>
> Questions:
> 1. Have others encountered issues like this?
> 2. How do I know when a checkpoint has been completed and is safe to
> restore from? (Is checkpoint completion atomic?)
>
> Thanks,
>
> Jeff Martin
>

Re: flink checkpoint state data corruption

Reply via email to