[
https://issues.apache.org/jira/browse/METRON-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629876#comment-15629876
]
Casey Stella commented on METRON-261:
-------------------------------------
Is this still happening, Nick? I haven't experienced it yet.
> Storm Supervisors Fail to Start
> -------------------------------
>
> Key: METRON-261
> URL: https://issues.apache.org/jira/browse/METRON-261
> Project: Metron
> Issue Type: Bug
> Reporter: Nick Allen
> Priority: Minor
> Labels: platform
> Fix For: 0.2.1BETA
>
>
> After deployment completes, the Storm Supervisors often fail to start
> correctly. This prevents any data from being ingested until the Supervisors
> are manually started.
> It appears that the Supervisors fail to communicate with Zookeeper and they
> timeout and die. Zookeeper may just not be ready in time. Not sure if this
> is something we can fix or if this is an Ambari issue.
> 2016-06-25 12:48:16.448 o.a.s.z.ClientCnxn [WARN] Session 0x0 for server
> null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> ~[?:1.8.0_40]
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> ~[?:1.8.0_40]
> at
> org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> ~[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> 2016-06-25 12:48:17.154 o.a.s.c.ConnectionState [ERROR] Connection timed out
> for connection string (ec2-52-41-178-50.us-west-2.compute.amazonaws.com:2181)
> and timeout (15000) / elapsed (15053)
> org.apache.storm.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
> at
> org.apache.storm.curator.ConnectionState.checkTimeouts(ConnectionState.java:195)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> org.apache.storm.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> org.apache.storm.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> org.apache.storm.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:487)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:226)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:215)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForegroundStandard(ExistsBuilderImpl.java:212)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:205)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:168)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:39)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> backtype.storm.zookeeper$exists_node_QMARK_$fn__3211.invoke(zookeeper.clj:107)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:104)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:120)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> backtype.storm.cluster$mk_distributed_cluster_state.doInvoke(cluster.clj:60)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at clojure.lang.RestFn.invoke(RestFn.java:486) [clojure-1.6.0.jar:?]
> at
> backtype.storm.cluster$mk_storm_cluster_state.doInvoke(cluster.clj:314)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at clojure.lang.RestFn.invoke(RestFn.java:439) [clojure-1.6.0.jar:?]
> at
> backtype.storm.daemon.supervisor$supervisor_data.invoke(supervisor.clj:296)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at
> backtype.storm.daemon.supervisor$fn__8449$exec_fn__3614__auto____8450.invoke(supervisor.clj:504)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at clojure.lang.AFn.applyToHelper(AFn.java:160) [clojure-1.6.0.jar:?]
> at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
> at clojure.core$apply.invoke(core.clj:624) [clojure-1.6.0.jar:?]
> at
> backtype.storm.daemon.supervisor$fn__8449$mk_supervisor__8476.doInvoke(supervisor.clj:500)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.6.0.jar:?]
> at
> backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:792)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:822)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
> at clojure.lang.AFn.applyToHelper(AFn.java:152) [clojure-1.6.0.jar:?]
> at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
> at backtype.storm.daemon.supervisor.main(Unknown Source)
> [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)