[ https://issues.apache.org/jira/browse/STORM-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron Gresch updated STORM-3139: -------------------------------- Comment: was deleted (was: Looking at all the code paths where credentials are removed, I do see this being triggered by Nimbus on restart from RemoveCorruptTopologies() for this failure: {code:java} 2018-06-30 10:08:21.961 o.a.s.n.LeaderListenerCallback main-EventThread [INFO] active-topology-blobs [topology-testHardCoreFaultTolerance-0-14-1530352957,topology-testHardCoreFaultTolerance-1-15-1530352958,topology-testHardCoreFaultTolerance-2-16-1530352959,topology-testHardCoreFaultTolerance-3-17-1530352960,topology-testHardCoreFaultTolerance-4-18-1530352961,topology-testHardCoreFaultTolerance-5-19-1530352963,topology-testHardCoreFaultTolerance-6-20-1530352964,topology-testHardCoreFaultTolerance-7-21-1530352966,topology-testHardCoreFaultTolerance-8-22-1530352967] local-topology-blobs [topology-testHardCoreFaultTolerance-2-16-1530352959-stormcode.ser,topology-testHardCoreFaultTolerance-7-21-1530352966-stormconf.ser,topology-testHardCoreFaultTolerance-8-22-1530352967-stormcode.ser,topology-testHardCoreFaultTolerance-6-20-1530352964-stormconf.ser,topology-testHardCoreFaultTolerance-0-14-1530352957-stormcode.ser,topology-testHardCoreFaultTolerance-3-17-1530352960-stormconf.ser,topology-testHardCoreFaultTolerance-1-15-1530352958-stormconf.ser,topology-testHardCoreFaultTolerance-1-15-1530352958-stormjar.jar,topology-testHardCoreFaultTolerance-3-17-1530352960-stormcode.ser,topology-testHardCoreFaultTolerance-5-19-1530352963-stormjar.jar,topology-testHardCoreFaultTolerance-7-21-1530352966-stormjar.jar,topology-testHardCoreFaultTolerance-3-17-1530352960-stormjar.jar,topology-testHardCoreFaultTolerance-6-20-1530352964-stormcode.ser,topology-testHardCoreFaultTolerance-7-21-1530352966-stormcode.ser,topology-testHardCoreFaultTolerance-1-15-1530352958-stormcode.ser,topology-testHardCoreFaultTolerance-0-14-1530352957-stormconf.ser,topology-testHardCoreFaultTolerance-2-16-1530352959-stormconf.ser,topology-testHardCoreFaultTolerance-5-19-1530352963-stormconf.ser,topology-testHardCoreFaultTolerance-4-18-1530352961-stormconf.ser,topology-testHardCoreFaultTolerance-2-16-1530352959-stormjar.jar,topology-testHardCoreFaultTolerance-0-14-1530352957-stormjar.jar,topology-testHardCoreFaultTolerance-8-22-1530352967-stormjar.jar,topology-testHardCoreFaultTolerance-4-18-1530352961-stormjar.jar,topology-testHardCoreFaultTolerance-6-20-1530352964-stormjar.jar,topology-testHardCoreFaultTolerance-4-18-1530352961-stormcode.ser,topology-testHardCoreFaultTolerance-8-22-1530352967-stormconf.ser,topology-testHardCoreFaultTolerance-5-19-1530352963-stormcode.ser] diff-topology-blobs [] {code} Spoke too soon, looks like the credentials were not deleted here....) > worker fails to start - KeeperErrorCode = NoAuth for /credentials/topologyname > ------------------------------------------------------------------------------ > > Key: STORM-3139 > URL: https://issues.apache.org/jira/browse/STORM-3139 > Project: Apache Storm > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Aaron Gresch > Priority: Major > > > Seeing a sporadic test failure internally for us with a worker that won't > come up. The test schedules a bunch of topologies, kills the supervisors, > restarts nimbus, and then starts up the supervisors and validates the > topologies are all fully running. > > I've seen this test failure twice in the last two weeks. The worker has > migrated and cannot come up: > > {code:java} > 2018-06-30 10:15:24.102 b.s.util main [WARN] Expecting exception of class: > class java.nio.channels.ClosedByInterruptException, but exception chain only > contains: (#<RuntimeException java.lang.RuntimeException: > org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoAuthException: > KeeperErrorCode = NoAuth for > /credentials/topology-testHardCoreFaultTolerance-7-21-1530352966> > #<NoAuthException > org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoAuthException: > KeeperErrorCode = NoAuth for > /credentials/topology-testHardCoreFaultTolerance-7-21-1530352966>) 2018-06-30 > 10:15:24.102 b.s.d.worker main [ERROR] Error on initialization of server > mk-worker java.lang.RuntimeException: > org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoAuthException: > KeeperErrorCode = NoAuth for > /credentials/topology-testHardCoreFaultTolerance-7-21-1530352966 at > backtype.storm.util$wrap_in_runtime.invoke(util.clj:53) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > backtype.storm.zookeeper$get_data.invoke(zookeeper.clj:135) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > backtype.storm.cluster_state.zookeeper_state_factory$_mkState$reify__4249.get_data(zookeeper_state_factory.clj:125) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_131] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_131] at > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.6.0.jar:?] at > clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) > ~[clojure-1.6.0.jar:?] at > org.apache.storm.pacemaker.pacemaker_state_factory$_mkState$reify__4296.get_data(pacemaker_state_factory.clj:175) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_131] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_131] at > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.6.0.jar:?] at > clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) > ~[clojure-1.6.0.jar:?] at > backtype.storm.cluster$mk_storm_cluster_state$reify__3910.credentials(cluster.clj:563) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_131] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_131] at > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.6.0.jar:?] at > clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) > ~[clojure-1.6.0.jar:?] at > backtype.storm.daemon.worker$fn__7710$exec_fn__1599__auto____7711.invoke(worker.clj:623) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > clojure.lang.AFn.applyToHelper(AFn.java:178) ~[clojure-1.6.0.jar:?] at > clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.6.0.jar:?] at > clojure.core$apply.invoke(core.clj:624) ~[clojure-1.6.0.jar:?] at > backtype.storm.daemon.worker$fn__7710$mk_worker__7803.doInvoke(worker.clj:598) > [storm-core-0.10.2.y.jar:0.10.2.y] at > clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.6.0.jar:?] at > backtype.storm.daemon.worker$_main.invoke(worker.clj:810) > [storm-core-0.10.2.y.jar:0.10.2.y] at > clojure.lang.AFn.applyToHelper(AFn.java:165) [clojure-1.6.0.jar:?] at > clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?] at > backtype.storm.daemon.worker.main(Unknown Source) > [storm-core-0.10.2.y.jar:0.10.2.y] Caused by: > org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoAuthException: > KeeperErrorCode = NoAuth for > /credentials/topology-testHardCoreFaultTolerance-7-21-1530352966 at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:327) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:316) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > org.apache.storm.shade.org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:313) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:304) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:35) > ~[storm-core-0.10.2.y.jar:0.10.2.y] at > backtype.storm.zookeeper$get_data.invoke(zookeeper.clj:131) > ~[storm-core-0.10.2.y.jar:0.10.2.y] ... 31 more 2018-06-30 10:15:24.199 > b.s.util main [ERROR] Halting process: ("Error on initialization") > java.lang.RuntimeException: ("Error on initialization") > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)