[
https://issues.apache.org/jira/browse/FLINK-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669777#comment-15669777
]
ASF GitHub Bot commented on FLINK-5073:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/2816
[backport] [FLINK-5073] Use Executor to run ZooKeeper callbacks in
ZooKeeperStateHandleStore
Backport of #2815 for the release-1.1 branch.
Use dedicated Executor to run ZooKeeper callbacks in
ZooKeeperStateHandleStore instead
of running it in the ZooKeeper client's thread. The callback can be
blocking because it
discards state which might entail deleting files from disk.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink backportFixZooKeeperDelete
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2816.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2816
----
commit ae67fe9a3bbc768911b8eab8dc32d18c2cb10c1a
Author: Till Rohrmann <[email protected]>
Date: 2016-11-15T21:45:04Z
[FLINK-5073] Use Executor to run ZooKeeper callbacks in
ZooKeeperStateHandleStore
Use dedicated Executor to run ZooKeeper callbacks in
ZooKeeperStateHandleStore instead
of running it in the ZooKeeper client's thread. The callback can be
blocking because it
discards state which might entail deleting files from disk.
Add TestExecutors
----
> ZooKeeperCompleteCheckpointStore executes blocking delete operation in
> ZooKeeper client thread
> ----------------------------------------------------------------------------------------------
>
> Key: FLINK-5073
> URL: https://issues.apache.org/jira/browse/FLINK-5073
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination
> Affects Versions: 1.2.0, 1.1.3
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> When deleting completed checkpoints from the
> {{ZooKeeperCompletedCheckpointStore}}, one first tries to delete the meta
> state handle from ZooKeeper and then deletes the actual checkpoint in a
> callback from the delete operation. This callback is executed by the
> ZooKeeper client's main thread which is problematic, because it blocks the
> ZooKeeper client. If a delete operation takes longer than it takes to
> complete a checkpoint, then it might even happen that delete operations of
> outdated checkpoints are piling up because they are effectively executed
> sequentially.
> I propose to execute the delete operations by a dedicated {{Executor}} so
> that we keep the client's main thread free to do ZooKeeper related work.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)