RuiLi8080 opened a new pull request #3353: URL: https://github.com/apache/storm/pull/3353
## What is the purpose of the change Adding submitLock to leaderCallBack to avoid race-condition. ## How was the change tested First, we reproduce the NPE exception by adding 60s sleep right before this step. https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L222 When the sleep starts, we restart zookeeper to trigger leader-re-election and kill the test topo. This lock can prevent the race-condition even with the 60s sleep. Look at the 60s gap on timestamp. Nimbus log: ``` 2020-11-17 06:24:25.114 o.a.s.c.StormClusterStateImpl main-EventThread [INFO] syncRemoteAssignments sleeps for 60s 2020-11-17 06:24:36.126 o.a.s.d.n.Nimbus pool-34-thread-28 [INFO] TRANSITION: wc-1-1605594107 KILL null true ... 60s sleep ... 2020-11-17 06:25:26.704 o.a.s.d.n.Nimbus timer [INFO] TRANSITION: wc-1-1605594107 GAIN_LEADERSHIP null false 2020-11-17 06:25:26.742 o.a.s.d.n.Nimbus timer [INFO] Delaying event REMOVE for 30 secs for wc-1-1605594107 2020-11-17 06:25:55.149 o.a.s.d.n.Nimbus timer [INFO] TRANSITION: wc-1-1605594107 REMOVE null false 2020-11-17 06:25:55.154 o.a.s.d.n.Nimbus timer [INFO] Killing topology: wc-1-1605594107 ``` Client console log: ``` -bash-4.2$ storm kill wc Running: /home/y/share/yjava_jdk/java/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/home/y/lib64/storm/2.3.0.y -Dstorm.log.dir=/home/y/lib64/storm/2.3.0.y/logs -Djava.library.path=/home/y/lib64:/usr/local/lib64:/usr/lib64:/lib64: -Dstorm.conf.file= -cp /home/y/lib64/storm/2.3.0.y/*:/home/y/lib64/storm/2.3.0.y/lib/*:/home/y/lib64/storm/2.3.0.y/extlib/*:/home/y/lib64/storm/2.3.0.y/extlib-daemon/*:/home/y/lib64/storm/current/conf:/home/y/lib64/storm/2.3.0.y/bin org.apache.storm.command.KillTopology wc 06:24:35.567 [main] INFO o.a.s.v.ConfigValidation - Will use [class org.apache.storm.DaemonConfig, class org.apache.storm.Config] for validation 06:24:35.715 [main] WARN o.a.s.v.ConfigValidation - Field public static final java.lang.String org.apache.storm.DaemonConfig.STORM_RESOURCE_ISOLATION_PLUGIN does not have validator annotation 06:24:35.726 [main] WARN o.a.s.v.ConfigValidation - topology.backpressure.enable is a deprecated config please see class org.apache.storm.Config.TOPOLOGY_BACKPRESSURE_ENABLE for more information. 06:24:35.868 [main] INFO o.a.s.m.n.Login - Successfully logged in to context StormClient using /etc/grid-keytabs/jaas.conf 06:24:35.871 [Refresh-TGT] INFO o.a.s.m.n.Login - TGT refresh thread started. 06:24:35.897 [Refresh-TGT] INFO o.a.s.m.n.Login - TGT valid starting at: Tue Nov 17 05:56:26 UTC 2020 06:24:35.897 [Refresh-TGT] INFO o.a.s.m.n.Login - TGT expires: Wed Nov 18 05:56:26 UTC 2020 06:24:35.898 [Refresh-TGT] INFO o.a.s.m.n.Login - TGT refresh sleeping until: Wed Nov 18 02:13:43 UTC 2020 06:24:36.077 [main] INFO o.a.s.u.NimbusClient - Found leader nimbus : openstorm3blue-n4.blue.ygrid.yahoo.com:50560 ... 60s sleep ... 06:25:25.181 [main] INFO o.a.s.c.KillTopology - Killed topology: wc ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
