[
https://issues.apache.org/jira/browse/FLINK-6612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015629#comment-16015629
]
ASF GitHub Bot commented on FLINK-6612:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/3939
[FLINK-6612] Allow ZooKeeperStateHandleStore to lock created ZNodes
In order to guard against deletions of ZooKeeper nodes which are still
being used
by a different ZooKeeperStateHandleStore, we have to introduce a locking
mechanism.
Only after all ZooKeeperStateHandleStores have released their lock, the
ZNode is
allowed to be deleted.
THe locking mechanism is implemented via ephemeral child nodes of the
respective
ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to lock a ZNode,
thus,
protecting it from being deleted, it creates an ephemeral child node. The
node's
name is unique to the ZooKeeperStateHandleStore instance. The delete
operations
will then only delete the node if it does not have any children associated.
In order to guard against oprhaned lock nodes, they are created as
ephemeral nodes.
This means that they will be deleted by ZooKeeper once the connection of the
ZooKeeper client which created the node timed out.
cc @StefanRRichter
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink addZooKeeperRefCounting
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3939.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3939
----
commit def077a8d95921645733169d420d548842dde257
Author: Till Rohrmann <[email protected]>
Date: 2017-05-17T12:52:04Z
[FLINK-6612] Allow ZooKeeperStateHandleStore to lock created ZNodes
In order to guard against deletions of ZooKeeper nodes which are still
being used
by a different ZooKeeperStateHandleStore, we have to introduce a locking
mechanism.
Only after all ZooKeeperStateHandleStores have released their lock, the
ZNode is
allowed to be deleted.
THe locking mechanism is implemented via ephemeral child nodes of the
respective
ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to lock a ZNode,
thus,
protecting it from being deleted, it creates an ephemeral child node. The
node's
name is unique to the ZooKeeperStateHandleStore instance. The delete
operations
will then only delete the node if it does not have any children associated.
In order to guard against oprhaned lock nodes, they are created as
ephemeral nodes.
This means that they will be deleted by ZooKeeper once the connection of the
ZooKeeper client which created the node timed out.
----
> ZooKeeperStateHandleStore does not guard against concurrent delete operations
> -----------------------------------------------------------------------------
>
> Key: FLINK-6612
> URL: https://issues.apache.org/jira/browse/FLINK-6612
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination, State Backends, Checkpointing
> Affects Versions: 1.3.0, 1.4.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Blocker
> Fix For: 1.3.0, 1.4.0
>
>
> The {{ZooKeeperStateHandleStore}} does not guard against concurrent delete
> operations which could happen in case of a lost leadership and a new
> leadership grant. The problem is that checkpoint nodes can get deleted even
> after they have been recovered by another
> {{ZooKeeperCompletedCheckpointStore}}. This corrupts the recovered checkpoint
> and thwarts future recoveries.
> I propose to add reference counting to the {{ZooKeeperStateHandleStore}}.
> That way, we can monitor how many concurrent processes have a hold on a given
> checkpoint node. Only if the reference count reaches {{0}}, we are allowed to
> delete the checkpoint node and dispose the checkpoint data.
> Stephan proposed to use ephemeral child nodes to track the reference count of
> a checkpoint node. That way we are sure that locks on the a checkpoint node
> are released in case of {{JobManager}} failures.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)