Nick Allen created METRON-261:
---------------------------------

             Summary: Storm Supervisors Fail to Start
                 Key: METRON-261
                 URL: https://issues.apache.org/jira/browse/METRON-261
             Project: Metron
          Issue Type: Bug
            Reporter: Nick Allen


After deployment completes, the Storm Supervisors often fail to start 
correctly.  This prevents any data from being ingested until the Supervisors 
are manually started.  

It appears that the Supervisors fail to communicate with Zookeeper and they 
timeout and die.  Zookeeper may just not be ready in time.  Not sure if this is 
something we can fix or if this is an Ambari issue.

2016-06-25 12:48:16.448 o.a.s.z.ClientCnxn [WARN] Session 0x0 for server null, 
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 
~[?:1.8.0_40]
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 
~[?:1.8.0_40]
        at 
org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
 ~[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
2016-06-25 12:48:17.154 o.a.s.c.ConnectionState [ERROR] Connection timed out 
for connection string (ec2-52-41-178-50.us-west-2.compute.amazonaws.com:2181) 
and timeout (15000) / elapsed (15053)
org.apache.storm.curator.CuratorConnectionLossException: KeeperErrorCode = 
ConnectionLoss
        at 
org.apache.storm.curator.ConnectionState.checkTimeouts(ConnectionState.java:195)
 [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
org.apache.storm.curator.ConnectionState.getZooKeeper(ConnectionState.java:87) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
org.apache.storm.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
 [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
org.apache.storm.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:487)
 [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:226)
 [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:215)
 [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForegroundStandard(ExistsBuilderImpl.java:212)
 [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:205)
 [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:168)
 [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:39)
 [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
backtype.storm.zookeeper$exists_node_QMARK_$fn__3211.invoke(zookeeper.clj:107) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:104) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:120) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
backtype.storm.cluster$mk_distributed_cluster_state.doInvoke(cluster.clj:60) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at clojure.lang.RestFn.invoke(RestFn.java:486) [clojure-1.6.0.jar:?]
        at 
backtype.storm.cluster$mk_storm_cluster_state.doInvoke(cluster.clj:314) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at clojure.lang.RestFn.invoke(RestFn.java:439) [clojure-1.6.0.jar:?]
        at 
backtype.storm.daemon.supervisor$supervisor_data.invoke(supervisor.clj:296) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at 
backtype.storm.daemon.supervisor$fn__8449$exec_fn__3614__auto____8450.invoke(supervisor.clj:504)
 [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at clojure.lang.AFn.applyToHelper(AFn.java:160) [clojure-1.6.0.jar:?]
        at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
        at clojure.core$apply.invoke(core.clj:624) [clojure-1.6.0.jar:?]
        at 
backtype.storm.daemon.supervisor$fn__8449$mk_supervisor__8476.doInvoke(supervisor.clj:500)
 [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.6.0.jar:?]
        at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:792) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:822) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
        at clojure.lang.AFn.applyToHelper(AFn.java:152) [clojure-1.6.0.jar:?]
        at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
        at backtype.storm.daemon.supervisor.main(Unknown Source) 
[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to