[
https://issues.apache.org/jira/browse/STORM-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854287#comment-13854287
]
Robert Joseph Evans commented on STORM-180:
-------------------------------------------
How many zookeeper nodes did you have down out of how many total? Looking at
the stack trace it is caused by curator saying it took too long for zookeeper
to connect to the ensemble and it timed out. In this case it is waiting for
Zookeeper to call back into the watcher to say that it has syncConnected. So
this may be an issue with zookeeper, or curator has missed some other
connection mode, or it could be that you actually don't have a full quorum and
zk is doing the correct thing and not connecting. Say you have 2 zk nodes and
one of them is down.
> Nimbus fails to start In case of non-worked zookeeper server specifed in
> config
> -------------------------------------------------------------------------------
>
> Key: STORM-180
> URL: https://issues.apache.org/jira/browse/STORM-180
> Project: Apache Storm (Incubating)
> Issue Type: Bug
> Environment: Storm 0.9.0.1
> Reporter: Alexander Yerenkow
> Labels: nimbus, zookeeper
>
> For example, If I specify three servers in config for zookeeper, and any of
> them not working during
> `storm nimbus`
> then nimbus fails to start at all -
> Can this be avoided somehow?
> If one of zookeeper servers failed, then I had to reconfigure all storm
> instances to simply start it and connect to working zk servers.
> Backtrace:
> 2013-12-20 14:08:08 b.s.d.nimbus [ERROR] Error on initialization of server
> service-handler
> java.lang.RuntimeException:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss
> at backtype.storm.util$wrap_in_runtime.invoke(util.clj:28)
> ~[storm-core-0.9.0.1.jar:na]
> at
> backtype.storm.zookeeper$exists_node_QMARK_$fn__991.invoke(zookeeper.clj:82)
> ~[storm-core-0.9.0.1.jar:na]
> at
> backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:78)
> ~[storm-core-0.9.0.1.jar:na]
> at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:92)
> ~[storm-core-0.9.0.1.jar:na]
> at
> backtype.storm.cluster$mk_distributed_cluster_state.invoke(cluster.clj:26)
> ~[storm-core-0.9.0.1.jar:na]
> at
> backtype.storm.cluster$mk_storm_cluster_state.invoke(cluster.clj:201)
> ~[storm-core-0.9.0.1.jar:na]
> at backtype.storm.daemon.nimbus$nimbus_data.invoke(nimbus.clj:51)
> ~[storm-core-0.9.0.1.jar:na]
> at
> backtype.storm.daemon.nimbus$fn__5528$exec_fn__1229__auto____5529.invoke(nimbus.clj:884)
> ~[storm-core-0.9.0.1.jar:na]
> at clojure.lang.AFn.applyToHelper(AFn.java:163)
> ~[clojure-1.4.0.jar:na]
> at clojure.lang.AFn.applyTo(AFn.java:151) ~[clojure-1.4.0.jar:na]
> at clojure.core$apply.invoke(core.clj:601) ~[clojure-1.4.0.jar:na]
> at
> backtype.storm.daemon.nimbus$fn__5528$service_handler__5616.doInvoke(nimbus.clj:881)
> ~[storm-core-0.9.0.1.jar:na]
> at clojure.lang.RestFn.invoke(RestFn.java:421) ~[clojure-1.4.0.jar:na]
> at
> backtype.storm.daemon.nimbus$launch_server_BANG_.invoke(nimbus.clj:1136)
> ~[storm-core-0.9.0.1.jar:na]
> at backtype.storm.daemon.nimbus$_launch.invoke(nimbus.clj:1167)
> ~[storm-core-0.9.0.1.jar:na]
> at backtype.storm.daemon.nimbus$_main.invoke(nimbus.clj:1189)
> ~[storm-core-0.9.0.1.jar:na]
> at clojure.lang.AFn.applyToHelper(AFn.java:159)
> ~[clojure-1.4.0.jar:na]
> at clojure.lang.AFn.applyTo(AFn.java:151) ~[clojure-1.4.0.jar:na]
> at backtype.storm.daemon.nimbus.main(Unknown Source)
> ~[storm-core-0.9.0.1.jar:na]
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss
> at
> com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:72)
> ~[curator-client-1.0.1.jar:na]
> at
> com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:74)
> ~[curator-client-1.0.1.jar:na]
> at
> com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:353)
> ~[curator-framework-1.0.1.jar:na]
> at
> com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:149)
> ~[curator-framework-1.0.1.jar:na]
> at
> com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:138)
> ~[curator-framework-1.0.1.jar:na]
> at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
> ~[curator-client-1.0.1.jar:na]
> at
> com.netflix.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:134)
> ~[curator-framework-1.0.1.jar:na]
> at
> com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:125)
> ~[curator-framework-1.0.1.jar:na]
> at
> com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:34)
> ~[curator-framework-1.0.1.jar:na]
> at
> backtype.storm.zookeeper$exists_node_QMARK_$fn__991.invoke(zookeeper.clj:81)
> ~[storm-core-0.9.0.1.jar:na]
> ... 17 common frames omitted
> 2013-12-20 14:08:08 b.s.util [INFO] Halting process: ("Error on
> initialization")
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)