GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/3451
[backport-1.1] [FLINK-5940] [checkpoint] Harden
ZooKeeperCompletedCheckpointStore.recover method
Backpor of #3446 onto `release-1.1` branch.
The ZooKeeperCompletedCheckpointStore only tries to recover the latest
completed
checkpoint even though it might have read older checkpoint state handles
from
ZooKeeper. In order to deal with corrupted state handles, this commit
changes the
behaviour such that the completed checkpoint store adds all read retrievable
state handles from ZooKeeper and upon request of the latest checkpoint it
will
return the latest completed checkpoint which could be retrieved from the
state
handles. Broken state handles are removed from the completed checkpoint
store and
ZooKeeper.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink fixCheckpointRecoveryBp1.1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3451.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3451
----
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---