[
https://issues.apache.org/jira/browse/STORM-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stig Rohde Døssing resolved STORM-3103.
---------------------------------------
Resolution: Fixed
Fix Version/s: 2.0.0
Thanks [~agresch], merged to master. Please open another PR if you'd like this
fix in the 1.x branches as well, it didn't cherry-pick cleanly, but I think
it's probably just whitespace changes.
> nimbus stuck shutting down causing leadership issues on startup
> ---------------------------------------------------------------
>
> Key: STORM-3103
> URL: https://issues.apache.org/jira/browse/STORM-3103
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Aaron Gresch
> Assignee: Aaron Gresch
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.0.0
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> When debugging an Nimbus NPE that caused restarts, I noticed that a forced
> halt occurred:
>
> {code:java}
> 2018-05-24 09:27:05.569 o.a.z.ClientCnxn
> main-SendThread(openqe82blue-gw.blue.ygrid.yahoo.com:2181) [INFO] Opening
> socket connection to server
> openqe82blue-gw.blue.ygrid.yahoo.com/10.215.77.115:2181. Will attempt to
> SASL-authenticate using Login Context section 'Client'
> 2018-05-24 09:27:05.570 o.a.z.ClientCnxn
> main-SendThread(openqe82blue-gw.blue.ygrid.yahoo.com:2181) [INFO] Socket
> connection established to
> openqe82blue-gw.blue.ygrid.yahoo.com/10.215.77.115:2181, initiating session
> 2018-05-24 09:27:05.571 o.a.z.ClientCnxn
> main-SendThread(openqe82blue-gw.blue.ygrid.yahoo.com:2181) [INFO] Session
> establishment complete on server
> openqe82blue-gw.blue.ygrid.yahoo.com/10.215.77.115:2181, sessionid =
> 0x1624a86300f7f6b, negotiated timeout = 40000
> 2018-05-24 09:27:05.571 o.a.c.f.s.ConnectionStateManager main-EventThread
> [INFO] State change: CONNECTED
> 2018-05-24 09:27:05.636 o.a.s.d.n.Nimbus main [INFO] Starting nimbus server
> for storm version '2.0.0.y'
> 2018-05-24 09:27:06.012 o.a.s.d.n.Nimbus timer [ERROR] Error while processing
> event
> java.lang.RuntimeException: java.lang.NullPointerException
> at
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$37(Nimbus.java:2685)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> at org.apache.storm.StormTimer$1.run(StormTimer.java:111)
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:227)
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> Caused by: java.lang.NullPointerException
> at
> org.apache.storm.daemon.nimbus.Nimbus.readAllSupervisorDetails(Nimbus.java:1814)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.daemon.nimbus.Nimbus.computeNewSchedulerAssignments(Nimbus.java:1906)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.daemon.nimbus.Nimbus.mkAssignments(Nimbus.java:2057)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.daemon.nimbus.Nimbus.mkAssignments(Nimbus.java:2003)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$37(Nimbus.java:2681)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> ... 2 more
> 2018-05-24 09:27:06.023 o.a.s.u.Utils timer [ERROR] Halting process: Error
> while processing event
> java.lang.RuntimeException: Halting process: Error while processing event
> at org.apache.storm.utils.Utils.exitProcess(Utils.java:469)
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.daemon.nimbus.Nimbus.lambda$new$17(Nimbus.java:484)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:252)
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-05-24 09:27:06.032 o.a.s.d.n.Nimbus Thread-12 [INFO] Shutting down master
> 2018-05-24 09:27:06.032 o.a.s.u.Utils Thread-13 [INFO] Halting after 5 seconds
> {code}
> At times this would cause leadership confusion:
>
> {code:java}
> 2018-05-24 09:27:21.762 o.a.s.z.LeaderElectorImp main [INFO] Queued up for
> leader lock.
> 2018-05-24 09:27:22.604 o.a.s.d.n.Nimbus timer [INFO] not a leader, skipping
> assignments
> 2018-05-24 09:27:22.604 o.a.s.d.n.Nimbus timer [INFO] not a leader, skipping
> cleanup
> 2018-05-24 09:27:22.633 o.a.s.d.n.Nimbus timer [INFO] not a leader, skipping
> credential renewal.
> 2018-05-24 09:27:40.771 o.a.s.d.n.Nimbus pool-37-thread-63 [WARN] Topology
> submission exception. (topology name='topology-testOverSubscribe-1')
> java.lang.RuntimeException: not a leader, current leader is
> NimbusInfo{host='openqe82blue-n1.blue.ygrid.yahoo.com', port=50560,
> isLeader=true}
> at
> org.apache.storm.daemon.nimbus.Nimbus.assertIsLeader(Nimbus.java:1311)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2807)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3454)
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3438)
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> ~[libthrift-0.9.3.jar:0.9.3]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> ~[libthrift-0.9.3.jar:0.9.3]
> at
> org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:147)
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> ~[libthrift-0.9.3.jar:0.9.3]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [?:1.8.0_131]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [?:1.8.0_131]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> 2018-05-24 09:27:40.771 o.a.s.b.BlobStoreUtils Timer-1 [ERROR] Could not
> download the blob with key:
> topology-testOverCapacityScheduling-2-1519992333-stormcode.ser
> 2018-05-24 09:27:40.771 o.a.t.s.TThreadPoolServer pool-37-thread-63 [ERROR]
> Error occurred during processing of message.
> java.lang.RuntimeException: java.lang.RuntimeException: not a leader, current
> leader is NimbusInfo{host='openqe82blue-n1.blue.ygrid.yahoo.com', port=50560,
> isLeader=true}
> at
> org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2961)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3454)
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3438)
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> ~[libthrift-0.9.3.jar:0.9.3]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> ~[libthrift-0.9.3.jar:0.9.3]
> at
> org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:147)
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> ~[libthrift-0.9.3.jar:0.9.3]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [?:1.8.0_131]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [?:1.8.0_131]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> Caused by: java.lang.RuntimeException: not a leader, current leader is
> NimbusInfo{host='openqe82blue-n1.blue.ygrid.yahoo.com', port=50560,
> isLeader=true}
> at
> org.apache.storm.daemon.nimbus.Nimbus.assertIsLeader(Nimbus.java:1311)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> at
> org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2807)
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
> ... 9 more
> {code}
> We should endeavor to shutdown cleanly.
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)