[ 
https://issues.apache.org/jira/browse/STORM-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-2987:
----------------------------------
    Labels: pull-request-available  (was: )

> PaceMakerStateStorage should deal with InterruptedException correctly
> ---------------------------------------------------------------------
>
>                 Key: STORM-2987
>                 URL: https://issues.apache.org/jira/browse/STORM-2987
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Ethan Li
>            Assignee: Ethan Li
>            Priority: Major
>              Labels: pull-request-available
>
> We found an issue that when nimbus restarted, it can only get leadership 
> after a few seconds (15~20s). 
>  
> {code:java}
> 2018-02-27 08:18:43.420 main o.a.s.z.LeaderElectorImp [INFO] Queued up for 
> leader lock.
> 2018-02-27 08:18:43.481 main o.a.s.d.m.MetricsUtils [INFO] Using statistics 
> reporter 
> plugin:org.apache.storm.daemon.metrics.reporters.JmxPreparableReporter
> 2018-02-27 08:18:43.483 main o.a.s.d.m.r.JmxPreparableReporter [INFO] 
> Preparing...
> 2018-02-27 08:18:43.499 main o.a.s.m.StormMetricsRegistry [INFO] Started 
> statistics report plugin...
> 2018-02-27 08:18:43.543 main o.a.s.m.n.Login [INFO] successfully logged in.
> 2018-02-27 08:18:43.551 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator
> 2018-02-27 08:18:43.551 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2018-02-27 08:18:43.552 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh thread 
> started.
> 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT valid starting 
> at:        Tue Feb 27 08:18:43 UTC 2018
> 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT expires:       
>            Wed Feb 28 08:18:43 UTC 2018
> 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh 
> sleeping until: Wed Feb 28 04:35:55 UTC 2018
> 2018-02-27 08:18:43.553 main o.a.z.ZooKeeper [INFO] Initiating client 
> connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181 
> sessionTimeout=60000 watcher=org.apache.
> curator.ConnectionState@2e185cd7
> 2018-02-27 08:18:43.559 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default 
> schema
> 2018-02-27 08:18:43.560 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) 
> o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mechanism.
> 2018-02-27 08:18:43.561 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Opening socket connection to server openqe74blue-gw.blue.ygrid.yahoo.co
> m/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context 
> section 'Client'
> 2018-02-27 08:18:43.562 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Socket connection established to openqe74blue-gw.blue.ygrid.yahoo.com/1
> 0.215.68.156:2181, initiating session
> 2018-02-27 08:18:43.565 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Session establishment complete on server openqe74blue-gw.blue.ygrid.yah
> oo.com/10.215.68.156:2181, sessionid = 0x161d5f1ae970099, negotiated timeout 
> = 40000
> 2018-02-27 08:18:43.565 main-EventThread o.a.c.f.s.ConnectionStateManager 
> [INFO] State change: CONNECTED
> 2018-02-27 08:18:43.605 Curator-Framework-0 o.a.c.f.i.CuratorFrameworkImpl 
> [INFO] backgroundOperationsLoop exiting
> 2018-02-27 08:18:43.625 main o.a.z.ZooKeeper [INFO] Session: 
> 0x161d5f1ae970099 closed
> 2018-02-27 08:18:43.625 main-EventThread o.a.z.ClientCnxn [INFO] EventThread 
> shut down
> 2018-02-27 08:18:43.626 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator
> 2018-02-27 08:18:43.626 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2018-02-27 08:18:43.635 main o.a.z.ZooKeeper [INFO] Initiating client 
> connection, 
> connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181/storm_ystormQE_CI 
> sessionTimeout=60000 w
> atcher=org.apache.curator.ConnectionState@46cc127b
> 2018-02-27 08:18:43.654 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) 
> o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mechanism.
> 2018-02-27 08:18:43.660 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Opening socket connection to server openqe74blue-gw.blue.ygrid.yahoo.co
> m/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context 
> section 'Client'
> 2018-02-27 08:18:43.663 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default 
> schema
> 2018-02-27 08:18:43.663 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Socket connection established to openqe74blue-gw.blue.ygrid.yahoo.com/1
> 0.215.68.156:2181, initiating session
> 2018-02-27 08:18:43.666 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Session establishment complete on server openqe74blue-gw.blue.ygrid.yah
> oo.com/10.215.68.156:2181, sessionid = 0x161d5f1ae97009a, negotiated timeout 
> = 40000
> 2018-02-27 08:18:43.669 main-EventThread o.a.c.f.s.ConnectionStateManager 
> [INFO] State change: CONNECTED
> 2018-02-27 08:18:43.790 main o.a.s.d.n.Nimbus [INFO] Starting nimbus server 
> for storm version '2.0.0.y'
> 2018-02-27 08:18:44.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping 
> assignments
> 2018-02-27 08:18:44.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping 
> cleanup
> 2018-02-27 08:18:44.300 timer o.a.s.b.BlobStoreUtils [ERROR] Could not 
> download the blob with key: blob-5-1518767144-stormcode.ser
> 2018-02-27 08:18:44.301 timer o.a.s.b.BlobStoreUtils [ERROR] Could not 
> download the blob with key: 
> TestZkErrorNodesHaveCorrectAcls-3-1519540302-stormcode.ser
> 2018-02-27 08:18:44.302 timer o.a.s.b.BlobStoreUtils [ERROR] Could not 
> download the blob with key: 
> logviewer-ui-groups-test-1-1518940914-stormcode.ser
> 2018-02-27 08:18:44.303 timer o.a.s.b.BlobStoreUtils [ERROR] Could not 
> download the blob with key: blob-5-1518800831-stormcode.ser
> 2018-02-27 08:18:44.304 timer o.a.s.b.BlobStoreUtils [ERROR] Could not 
> download the blob with key: blob-5-1518767144-stormconf.ser
> 2018-02-27 08:18:44.306 timer o.a.s.b.BlobStoreUtils [ERROR] Could not 
> download the blob with key: 
> logviewer-ui-groups-test-1-1518940914-stormconf.ser
> 2018-02-27 08:18:44.307 timer o.a.s.b.BlobStoreUtils [ERROR] Could not 
> download the blob with key: 
> TestZkErrorNodesHaveCorrectAcls-3-1519540302-stormconf.ser
> 2018-02-27 08:18:44.308 timer o.a.s.b.BlobStoreUtils [ERROR] Could not 
> download the blob with key: blob-5-1518800831-stormconf.ser
> 2018-02-27 08:18:44.367 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping 
> credential renewal.
> 2018-02-27 08:18:54.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping 
> assignments
> 2018-02-27 08:18:54.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping 
> cleanup
> 2018-02-27 08:18:59.367 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping 
> credential renewal.
> 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] 
> active-topology-blobs [] local-topology-blobs [] diff-topology-blobs []
> 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] 
> active-topology-dependencies [] local-blobs [] diff-topology-dependencies []
> 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] Accepting 
> leadership, all active topologies and corresponding dependencies found 
> locally.
> 2018-02-27 08:19:04.754 timer o.a.s.d.n.Nimbus [INFO] Scheduling took 442 ms 
> for 0 topologies
> {code}
>  
> This can be re-produced by the following steps:
> {code:java}
> 1. restart pacemaker;
> 2. before pacemaker is up, restart nimbus{code}
> When we restart nimbus process,  it runs ShutDownHooks and stuck on 
> timer.close(). 
> [https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L4227]
> The timer is not able to close because it's waiting for  doCleanup() to stop. 
> However, the interruptedException is caught and ate in : 
> [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClient.java#L180-L192]
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to