[ https://issues.apache.org/jira/browse/FLINK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383803#comment-16383803 ]
ASF GitHub Bot commented on FLINK-8807: --------------------------------------- GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/5623 [FLINK-8807] Fix ZookeeperCompleted checkpoint store can get stuck in infinite loop Before, CompletedCheckpoint did not have proper equals()/hashCode(), which meant that the fixpoint condition in ZooKeeperCompletedCheckpointStore would never hold if at least on checkpoint became unreadable. This adds proper equals()/hashCode() to CompletedCheckpoint and extends the test to properly create new CompletedCheckpoints. Before, we were reusing the same CompletedCheckpoint instances, meaning that Objects.equals()/hashCode() would make the test succeed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink jira-8807-zookeeper-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5623.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5623 ---- commit 777ddb57ee72d200d1312dc8e6dfdb52af6b9950 Author: Aljoscha Krettek <aljoscha.krettek@...> Date: 2018-03-02T16:46:56Z [FLINK-8807] Fix ZookeeperCompleted checkpoint store can get stuck in infinite loop Before, CompletedCheckpoint did not have proper equals()/hashCode(), which meant that the fixpoint condition in ZooKeeperCompletedCheckpointStore would never hold if at least on checkpoint became unreadable. This adds proper equals()/hashCode() to CompletedCheckpoint and extends the test to properly create new CompletedCheckpoints. Before, we were reusing the same CompletedCheckpoint instances, meaning that Objects.equals()/hashCode() would make the test succeed. ---- > ZookeeperCompleted checkpoint store can get stuck in infinite loop > ------------------------------------------------------------------ > > Key: FLINK-8807 > URL: https://issues.apache.org/jira/browse/FLINK-8807 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing > Affects Versions: 1.5.0 > Reporter: Aljoscha Krettek > Priority: Blocker > Fix For: 1.5.0 > > > This code: > https://github.com/apache/flink/blob/9071e3befb8c279f73c3094c9f6bddc0e7cce9e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L201 > can be stuck forever if at least one checkpoint is not readable because > {{CompletedCheckpoint}} does not have a proper {{equals()}}/{{hashCode()}} > anymore. > We have to fix this and also add a unit test that verifies the loop still > works if we make one snapshot unreadable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)