Nick Allen created METRON-261:
---------------------------------
Summary: Storm Supervisors Fail to Start
Key: METRON-261
URL: https://issues.apache.org/jira/browse/METRON-261
Project: Metron
Issue Type: Bug
Reporter: Nick Allen
After deployment completes, the Storm Supervisors often fail to start
correctly. This prevents any data from being ingested until the Supervisors
are manually started.
It appears that the Supervisors fail to communicate with Zookeeper and they
timeout and die. Zookeeper may just not be ready in time. Not sure if this is
something we can fix or if this is an Ambari issue.
2016-06-25 12:48:16.448 o.a.s.z.ClientCnxn [WARN] Session 0x0 for server null,
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
~[?:1.8.0_40]
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
~[?:1.8.0_40]
at
org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
~[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
2016-06-25 12:48:17.154 o.a.s.c.ConnectionState [ERROR] Connection timed out
for connection string (ec2-52-41-178-50.us-west-2.compute.amazonaws.com:2181)
and timeout (15000) / elapsed (15053)
org.apache.storm.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at
org.apache.storm.curator.ConnectionState.checkTimeouts(ConnectionState.java:195)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
org.apache.storm.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
org.apache.storm.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
org.apache.storm.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:487)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:226)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:215)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForegroundStandard(ExistsBuilderImpl.java:212)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:205)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:168)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:39)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
backtype.storm.zookeeper$exists_node_QMARK_$fn__3211.invoke(zookeeper.clj:107)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:104)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:120)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
backtype.storm.cluster$mk_distributed_cluster_state.doInvoke(cluster.clj:60)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at clojure.lang.RestFn.invoke(RestFn.java:486) [clojure-1.6.0.jar:?]
at
backtype.storm.cluster$mk_storm_cluster_state.doInvoke(cluster.clj:314)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at clojure.lang.RestFn.invoke(RestFn.java:439) [clojure-1.6.0.jar:?]
at
backtype.storm.daemon.supervisor$supervisor_data.invoke(supervisor.clj:296)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at
backtype.storm.daemon.supervisor$fn__8449$exec_fn__3614__auto____8450.invoke(supervisor.clj:504)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at clojure.lang.AFn.applyToHelper(AFn.java:160) [clojure-1.6.0.jar:?]
at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
at clojure.core$apply.invoke(core.clj:624) [clojure-1.6.0.jar:?]
at
backtype.storm.daemon.supervisor$fn__8449$mk_supervisor__8476.doInvoke(supervisor.clj:500)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.6.0.jar:?]
at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:792)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:822)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at clojure.lang.AFn.applyToHelper(AFn.java:152) [clojure-1.6.0.jar:?]
at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
at backtype.storm.daemon.supervisor.main(Unknown Source)
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)