[
https://issues.apache.org/jira/browse/STORM-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056567#comment-15056567
]
ASF GitHub Bot commented on STORM-1376:
---------------------------------------
Github user ppoulosk commented on a diff in the pull request:
https://github.com/apache/storm/pull/933#discussion_r47546588
--- Diff: storm-core/test/clj/backtype/storm/nimbus_test.clj ---
@@ -1238,10 +1238,11 @@
(testing "nimbus-data uses correct ACLs"
(let [scheme "digest"
digest "storm:thisisapoorpassword"
- auth-conf {STORM-ZOOKEEPER-AUTH-SCHEME scheme
+ auth-conf (merge (read-storm-config)
+ {STORM-ZOOKEEPER-AUTH-SCHEME scheme
STORM-ZOOKEEPER-AUTH-PAYLOAD digest
STORM-PRINCIPAL-TO-LOCAL-PLUGIN
"backtype.storm.security.auth.DefaultPrincipalToLocal"
- NIMBUS-THRIFT-PORT 6666}
+ NIMBUS-THRIFT-PORT 6666})
--- End diff --
I know that the code was already there, but is it possible to use an
ephemeral port here?
> ZK Becoming deadlocked with zookeeper_state_factory
> ---------------------------------------------------
>
> Key: STORM-1376
> URL: https://issues.apache.org/jira/browse/STORM-1376
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 0.11.0
> Reporter: Daniel Schonfeld
> Assignee: Sanket Reddy
> Priority: Blocker
>
> Since the introduction of blobstore and pacemaker we've noticed that when
> using nimbus with the new zookeeper_state_factory backing cluster state
> module, some of our ZK nodes become unresponsive and show and increasing
> amounts of outstanding requests (STAT 4-letter command).
> Terminating storm supervisors and nimbus usually gets zookeeper to realize
> after a few minutes those connections are dead and to become responsive
> again. In some extreme cases we have to kill that ZK nodes and bring it back
> up.
> Our topologies ran across ~10 supervisor nodes with each having about
> ~400-500 executors.
> I mention the amount of executors cause I am not sure if someone made each
> executor by mistake start sending heartbeats instead of each worker and that
> might possibly be the reason for this slow down.
> Final note. If someone can jot a few ideas of why this might be happening
> i'd be more than happy to dig further in the storm code and submit a PR
> myself. But I need some hint or direction of where to go with this...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)