[
https://issues.apache.org/jira/browse/FLINK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383803#comment-16383803
]
ASF GitHub Bot commented on FLINK-8807:
---------------------------------------
GitHub user aljoscha opened a pull request:
https://github.com/apache/flink/pull/5623
[FLINK-8807] Fix ZookeeperCompleted checkpoint store can get stuck in
infinite loop
Before, CompletedCheckpoint did not have proper equals()/hashCode(),
which meant that the fixpoint condition in
ZooKeeperCompletedCheckpointStore would never hold if at least on
checkpoint became unreadable.
This adds proper equals()/hashCode() to CompletedCheckpoint and extends
the test to properly create new CompletedCheckpoints. Before, we were
reusing the same CompletedCheckpoint instances, meaning that
Objects.equals()/hashCode() would make the test succeed.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/aljoscha/flink jira-8807-zookeeper-fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5623.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5623
----
commit 777ddb57ee72d200d1312dc8e6dfdb52af6b9950
Author: Aljoscha Krettek <aljoscha.krettek@...>
Date: 2018-03-02T16:46:56Z
[FLINK-8807] Fix ZookeeperCompleted checkpoint store can get stuck in
infinite loop
Before, CompletedCheckpoint did not have proper equals()/hashCode(),
which meant that the fixpoint condition in
ZooKeeperCompletedCheckpointStore would never hold if at least on
checkpoint became unreadable.
This adds proper equals()/hashCode() to CompletedCheckpoint and extends
the test to properly create new CompletedCheckpoints. Before, we were
reusing the same CompletedCheckpoint instances, meaning that
Objects.equals()/hashCode() would make the test succeed.
----
> ZookeeperCompleted checkpoint store can get stuck in infinite loop
> ------------------------------------------------------------------
>
> Key: FLINK-8807
> URL: https://issues.apache.org/jira/browse/FLINK-8807
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Affects Versions: 1.5.0
> Reporter: Aljoscha Krettek
> Priority: Blocker
> Fix For: 1.5.0
>
>
> This code:
> https://github.com/apache/flink/blob/9071e3befb8c279f73c3094c9f6bddc0e7cce9e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L201
> can be stuck forever if at least one checkpoint is not readable because
> {{CompletedCheckpoint}} does not have a proper {{equals()}}/{{hashCode()}}
> anymore.
> We have to fix this and also add a unit test that verifies the loop still
> works if we make one snapshot unreadable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)