[jira] [Commented] (STORM-1376) ZK Becoming deadlocked with zookeeper_state_factory

ASF GitHub Bot (JIRA) Tue, 08 Dec 2015 15:31:59 -0800

    [ 
https://issues.apache.org/jira/browse/STORM-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047686#comment-15047686
 ]


ASF GitHub Bot commented on STORM-1376:
---------------------------------------

GitHub user redsanket opened a pull request:

    https://github.com/apache/storm/pull/933

    [STORM-1376] Zk slowing down due to many connections

    Apologies, uneccessary creation of client and closing of connections puts 
load on zookeeper.
    Having a single client connection to perform reads helps

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/redsanket/storm zk-slowing-down

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/933.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #933
    
----
commit f27e5bce2c7fe2e06cef7746227a01099d7d32d3
Author: Sanket <schintap@untilservice-lm>
Date:   2015-12-08T22:58:28Z

    zk slowing down due to many connections

commit c511a7e5581c2aff5c4ddd25c5507950fea08005
Author: Sanket <schintap@untilservice-lm>
Date:   2015-12-08T23:12:02Z

    removing close method as shutdown does that

----


> ZK Becoming deadlocked with zookeeper_state_factory
> ---------------------------------------------------
>
>                 Key: STORM-1376
>                 URL: https://issues.apache.org/jira/browse/STORM-1376
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.11.0
>            Reporter: Daniel Schonfeld
>            Assignee: Sanket Reddy
>            Priority: Blocker
>
> Since the introduction of blobstore and pacemaker we've noticed that when 
> using nimbus with the new zookeeper_state_factory backing cluster state 
> module, some of our ZK nodes become unresponsive and show and increasing 
> amounts of outstanding requests (STAT 4-letter command).
> Terminating storm supervisors and nimbus usually gets zookeeper to realize 
> after a few minutes those connections are dead and to become responsive 
> again.  In some extreme cases we have to kill that ZK nodes and bring it back 
> up.
> Our topologies ran across ~10 supervisor nodes with each having about 
> ~400-500 executors. 
> I mention the amount of executors cause I am not sure if someone made each 
> executor by mistake start sending heartbeats instead of each worker and that 
> might possibly be the reason for this slow down.
> Final note.  If someone can jot a few ideas of why this might be happening 
> i'd be more than happy to dig further in the storm code and submit a PR 
> myself.  But I need some hint or direction of where to go with this...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1376) ZK Becoming deadlocked with zookeeper_state_factory

Reply via email to