[ https://issues.apache.org/jira/browse/STORM-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208882#comment-16208882 ]
Jungtaek Lim commented on STORM-2706: ------------------------------------- [~randgalt] Hi Jordan, I think it is ideal for resolving current issue if Curator could have new 2.x release. If not we may need to go on upgrading Curator version. AFAIK, we intended to not update Curator version because it refers Zookeeper 3.5.x which Zookeeper community has been saying it as 'alpha', and recently saying it as 'beta'. Does curator community feel safe for Zookeeper 3.5.x to upgrade? > Nimbus stuck in exception and does not fail fast > ------------------------------------------------ > > Key: STORM-2706 > URL: https://issues.apache.org/jira/browse/STORM-2706 > Project: Apache Storm > Issue Type: Bug > Affects Versions: 1.1.1 > Reporter: Bijan Fahimi Shemrani > Labels: nimbus > > We experience a problem in nimbus which leads it to get stuck in a retry and > fail loop. When I manually restart the nimbus it works again as expected. > However, it would be great if nimbus would shut down so our monitoring can > automatically restart the nimbus. > The nimbus log. > {noformat} > 24.8.2017 15:39:1913:39:19.804 [pool-13-thread-51] ERROR > org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - > Unexpected throwable while invoking! > 24.8.2017 > 15:39:19org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /storm/leader-lock > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown > Source) ~[?:?] > 24.8.2017 15:39:19 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] > 24.8.2017 15:39:19 at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_131] > 24.8.2017 15:39:19 at > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:19 at > clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:19 at > org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown > Source) ~[?:?] > 24.8.2017 15:39:19 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] > 24.8.2017 15:39:19 at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_131] > 24.8.2017 15:39:19 at > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:19 at > clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:19 at > org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getLeader(nimbus.clj:2412) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3944) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3928) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_131] > 24.8.2017 15:39:19 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_131] > 24.8.2017 15:39:19 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] > 24.8.2017 15:39:2713:39:27.205 [pool-13-thread-52] ERROR > org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - > Unexpected throwable while invoking! > 24.8.2017 > 15:39:27org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /storm/leader-lock > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown > Source) ~[?:?] > 24.8.2017 15:39:27 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] > 24.8.2017 15:39:27 at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_131] > 24.8.2017 15:39:27 at > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:27 at > clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:27 at > org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown > Source) ~[?:?] > 24.8.2017 15:39:27 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] > 24.8.2017 15:39:27 at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_131] > 24.8.2017 15:39:27 at > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:27 at > clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:27 at > org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1544) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getClusterInfo(nimbus.clj:2006) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3920) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3904) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_131] > 24.8.2017 15:39:27 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_131] > 24.8.2017 15:39:27 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] > 24.8.2017 15:39:2913:39:29.270 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping assignments > 24.8.2017 15:39:2913:39:29.270 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping cleanup > 24.8.2017 15:39:3913:39:39.270 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping assignments > 24.8.2017 15:39:3913:39:39.270 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping cleanup > 24.8.2017 15:39:4913:39:49.271 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping assignments > 24.8.2017 15:39:4913:39:49.272 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping cleanup > 24.8.2017 15:39:5913:39:59.272 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping assignments > 24.8.2017 15:39:5913:39:59.272 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping cleanup > 24.8.2017 15:40:0913:40:09.272 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping assignments > 24.8.2017 15:40:0913:40:09.272 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping cleanup > 24.8.2017 15:40:1313:40:13.806 [timer] INFO > org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl > - Starting > 24.8.2017 15:40:1313:40:13.807 [timer] INFO > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Initiating client > connection, connectString=zookeeper:2181/storm sessionTimeout=20000 > watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@f90354 > 24.8.2017 15:40:1313:40:13.808 [timer-SendThread(10.42.174.214:2181)] INFO > org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Opening socket > connection to server 10.42.174.214/10.42.174.214:2181. Will not attempt to > authenticate using SASL (unknown error) > 24.8.2017 15:40:1313:40:13.862 [timer-SendThread(10.42.174.214:2181)] INFO > org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Socket connection > established to 10.42.174.214/10.42.174.214:2181, initiating session > 24.8.2017 15:40:1313:40:13.865 [timer-SendThread(10.42.174.214:2181)] INFO > org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Session > establishment complete on server 10.42.174.214/10.42.174.214:2181, sessionid > = 0x15e14456dc70045, negotiated timeout = 20000 > 24.8.2017 15:40:1313:40:13.910 [timer] INFO > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Session: > 0x15e14456dc70045 closed > 24.8.2017 15:40:1313:40:13.910 [timer-EventThread] INFO > org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - EventThread shut down > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)