[jira] [Commented] (FLINK-5073) ZooKeeperCompleteCheckpointStore executes blocking delete operation in ZooKeeper client thread

ASF GitHub Bot (JIRA) Wed, 16 Nov 2016 00:08:44 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669777#comment-15669777
 ]


ASF GitHub Bot commented on FLINK-5073:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/2816

    [backport] [FLINK-5073] Use Executor to run ZooKeeper callbacks in 
ZooKeeperStateHandleStore

    Backport of #2815 for the release-1.1 branch.
    
    Use dedicated Executor to run ZooKeeper callbacks in 
ZooKeeperStateHandleStore instead
    of running it in the ZooKeeper client's thread. The callback can be 
blocking because it
    discards state which might entail deleting files from disk.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink backportFixZooKeeperDelete

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2816.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2816
    
----
commit ae67fe9a3bbc768911b8eab8dc32d18c2cb10c1a
Author: Till Rohrmann <[email protected]>
Date:   2016-11-15T21:45:04Z

    [FLINK-5073] Use Executor to run ZooKeeper callbacks in 
ZooKeeperStateHandleStore
    
    Use dedicated Executor to run ZooKeeper callbacks in 
ZooKeeperStateHandleStore instead
    of running it in the ZooKeeper client's thread. The callback can be 
blocking because it
    discards state which might entail deleting files from disk.
    
    Add TestExecutors

----


> ZooKeeperCompleteCheckpointStore executes blocking delete operation in 
> ZooKeeper client thread
> ----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-5073
>                 URL: https://issues.apache.org/jira/browse/FLINK-5073
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.2.0, 1.1.3
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>             Fix For: 1.2.0, 1.1.4
>
>
> When deleting completed checkpoints from the 
> {{ZooKeeperCompletedCheckpointStore}}, one first tries to delete the meta 
> state handle from ZooKeeper and then deletes the actual checkpoint in a 
> callback from the delete operation. This callback is executed by the 
> ZooKeeper client's main thread which is problematic, because it blocks the 
> ZooKeeper client. If a delete operation takes longer than it takes to 
> complete a checkpoint, then it might even happen that delete operations of 
> outdated checkpoints are piling up because they are effectively executed 
> sequentially.
> I propose to execute the delete operations by a dedicated {{Executor}} so 
> that we keep the client's main thread free to do ZooKeeper related work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5073) ZooKeeperCompleteCheckpointStore executes blocking delete operation in ZooKeeper client thread

Reply via email to