[ 
https://issues.apache.org/jira/browse/STORM-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530396#comment-16530396
 ] 

Aaron Gresch commented on STORM-3139:
-------------------------------------

Looking at all the code paths where credentials are removed, I do see this 
being triggered by Nimbus on restart from RemoveCorruptTopologies() for this 
failure:

 
{code:java}
2018-06-30 10:08:21.961 o.a.s.n.LeaderListenerCallback main-EventThread [INFO] 
active-topology-blobs 
[topology-testHardCoreFaultTolerance-0-14-1530352957,topology-testHardCoreFaultTolerance-1-15-1530352958,topology-testHardCoreFaultTolerance-2-16-1530352959,topology-testHardCoreFaultTolerance-3-17-1530352960,topology-testHardCoreFaultTolerance-4-18-1530352961,topology-testHardCoreFaultTolerance-5-19-1530352963,topology-testHardCoreFaultTolerance-6-20-1530352964,topology-testHardCoreFaultTolerance-7-21-1530352966,topology-testHardCoreFaultTolerance-8-22-1530352967]
 local-topology-blobs 
[topology-testHardCoreFaultTolerance-2-16-1530352959-stormcode.ser,topology-testHardCoreFaultTolerance-7-21-1530352966-stormconf.ser,topology-testHardCoreFaultTolerance-8-22-1530352967-stormcode.ser,topology-testHardCoreFaultTolerance-6-20-1530352964-stormconf.ser,topology-testHardCoreFaultTolerance-0-14-1530352957-stormcode.ser,topology-testHardCoreFaultTolerance-3-17-1530352960-stormconf.ser,topology-testHardCoreFaultTolerance-1-15-1530352958-stormconf.ser,topology-testHardCoreFaultTolerance-1-15-1530352958-stormjar.jar,topology-testHardCoreFaultTolerance-3-17-1530352960-stormcode.ser,topology-testHardCoreFaultTolerance-5-19-1530352963-stormjar.jar,topology-testHardCoreFaultTolerance-7-21-1530352966-stormjar.jar,topology-testHardCoreFaultTolerance-3-17-1530352960-stormjar.jar,topology-testHardCoreFaultTolerance-6-20-1530352964-stormcode.ser,topology-testHardCoreFaultTolerance-7-21-1530352966-stormcode.ser,topology-testHardCoreFaultTolerance-1-15-1530352958-stormcode.ser,topology-testHardCoreFaultTolerance-0-14-1530352957-stormconf.ser,topology-testHardCoreFaultTolerance-2-16-1530352959-stormconf.ser,topology-testHardCoreFaultTolerance-5-19-1530352963-stormconf.ser,topology-testHardCoreFaultTolerance-4-18-1530352961-stormconf.ser,topology-testHardCoreFaultTolerance-2-16-1530352959-stormjar.jar,topology-testHardCoreFaultTolerance-0-14-1530352957-stormjar.jar,topology-testHardCoreFaultTolerance-8-22-1530352967-stormjar.jar,topology-testHardCoreFaultTolerance-4-18-1530352961-stormjar.jar,topology-testHardCoreFaultTolerance-6-20-1530352964-stormjar.jar,topology-testHardCoreFaultTolerance-4-18-1530352961-stormcode.ser,topology-testHardCoreFaultTolerance-8-22-1530352967-stormconf.ser,topology-testHardCoreFaultTolerance-5-19-1530352963-stormcode.ser]
 diff-topology-blobs []
{code}
 

Seems like there is a race condition with this detection between active 
topologies and blobs.

> worker fails to start - KeeperErrorCode = NoAuth for /credentials/topologyname
> ------------------------------------------------------------------------------
>
>                 Key: STORM-3139
>                 URL: https://issues.apache.org/jira/browse/STORM-3139
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Aaron Gresch
>            Priority: Major
>
>  
> Seeing a sporadic test failure internally for us with a worker that won't 
> come up.  The test schedules a bunch of topologies, kills the supervisors, 
> restarts nimbus, and then starts up the supervisors and validates the 
> topologies are all fully running.
>  
> I've seen this test failure twice in the last two weeks.  The worker has 
> migrated and cannot come up:
>  
> {code:java}
> 2018-06-30 10:15:24.102 b.s.util main [WARN] Expecting exception of class: 
> class java.nio.channels.ClosedByInterruptException, but exception chain only 
> contains: (#<RuntimeException java.lang.RuntimeException: 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoAuthException: 
> KeeperErrorCode = NoAuth for 
> /credentials/topology-testHardCoreFaultTolerance-7-21-1530352966> 
> #<NoAuthException 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoAuthException: 
> KeeperErrorCode = NoAuth for 
> /credentials/topology-testHardCoreFaultTolerance-7-21-1530352966>) 2018-06-30 
> 10:15:24.102 b.s.d.worker main [ERROR] Error on initialization of server 
> mk-worker java.lang.RuntimeException: 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoAuthException: 
> KeeperErrorCode = NoAuth for 
> /credentials/topology-testHardCoreFaultTolerance-7-21-1530352966 at 
> backtype.storm.util$wrap_in_runtime.invoke(util.clj:53) 
> ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> backtype.storm.zookeeper$get_data.invoke(zookeeper.clj:135) 
> ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> backtype.storm.cluster_state.zookeeper_state_factory$_mkState$reify__4249.get_data(zookeeper_state_factory.clj:125)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131] at 
> clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.6.0.jar:?] at 
> clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) 
> ~[clojure-1.6.0.jar:?] at 
> org.apache.storm.pacemaker.pacemaker_state_factory$_mkState$reify__4296.get_data(pacemaker_state_factory.clj:175)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131] at 
> clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.6.0.jar:?] at 
> clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) 
> ~[clojure-1.6.0.jar:?] at 
> backtype.storm.cluster$mk_storm_cluster_state$reify__3910.credentials(cluster.clj:563)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131] at 
> clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.6.0.jar:?] at 
> clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) 
> ~[clojure-1.6.0.jar:?] at 
> backtype.storm.daemon.worker$fn__7710$exec_fn__1599__auto____7711.invoke(worker.clj:623)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> clojure.lang.AFn.applyToHelper(AFn.java:178) ~[clojure-1.6.0.jar:?] at 
> clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.6.0.jar:?] at 
> clojure.core$apply.invoke(core.clj:624) ~[clojure-1.6.0.jar:?] at 
> backtype.storm.daemon.worker$fn__7710$mk_worker__7803.doInvoke(worker.clj:598)
>  [storm-core-0.10.2.y.jar:0.10.2.y] at 
> clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.6.0.jar:?] at 
> backtype.storm.daemon.worker$_main.invoke(worker.clj:810) 
> [storm-core-0.10.2.y.jar:0.10.2.y] at 
> clojure.lang.AFn.applyToHelper(AFn.java:165) [clojure-1.6.0.jar:?] at 
> clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?] at 
> backtype.storm.daemon.worker.main(Unknown Source) 
> [storm-core-0.10.2.y.jar:0.10.2.y] Caused by: 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoAuthException: 
> KeeperErrorCode = NoAuth for 
> /credentials/topology-testHardCoreFaultTolerance-7-21-1530352966 at 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:327)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:316)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> org.apache.storm.shade.org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:313)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:304)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:35)
>  ~[storm-core-0.10.2.y.jar:0.10.2.y] at 
> backtype.storm.zookeeper$get_data.invoke(zookeeper.clj:131) 
> ~[storm-core-0.10.2.y.jar:0.10.2.y] ... 31 more 2018-06-30 10:15:24.199 
> b.s.util main [ERROR] Halting process: ("Error on initialization") 
> java.lang.RuntimeException: ("Error on initialization")
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to