[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342818#comment-15342818
 ] 

Chris Nauroth commented on ZOOKEEPER-2380:
------------------------------------------

Actually, I was wrong in my earlier comment.  It's acceptable for test coverage 
to enter either LOOKING or FOLLOWING state.  In both cases, since it has left 
LEADING state, by definition that means it lost its previous quorum, and 
therefore the code path is covered.  The change in revision 06 is sufficient.  
Thank you.

I just have one more request.

{code}
        // shutdown 2 followers so that leader does not have majority and goes
        // in looking state
        shutdownFollowers(mt);
        assertTrue("Leader failed to transition to LOOKING or FOLLOWING state", 
ClientBase.waitForServerState(leader,
                15000, QuorumStats.Provider.LOOKING_STATE, 
QuorumStats.Provider.FOLLOWING_STATE));
{code}

Please update the comment to state that FOLLOWING is acceptable too, depending 
on the timing of the test.

> Deadlock between leader shutdown and forwarding ACK to the leader
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2380
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>            Reporter: Arshad Mohammad
>            Assignee: Arshad Mohammad
>            Priority: Blocker
>             Fix For: 3.5.2, 3.6.0
>
>         Attachments: ZOOKEEPER-2380-01.patch, ZOOKEEPER-2380-02.patch, 
> ZOOKEEPER-2380-03.patch, ZOOKEEPER-2380-04.patch, ZOOKEEPER-2380-05.patch, 
> ZOOKEEPER-2380-06.patch, ZOOKEEPER-2380-fail.out
>
>
> Zookeeper enters into deadlock while shutting down itself, thus making 
> zookeeper service unavailable as deadlocked server is a leader. Here is the 
> thread dump:
> {code}
> "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 
> os_prio=0 tid=0x00007fbc502a6800 nid=0x834 in Object.wait() 
> [0x00007fbc4d9a8000]      java.lang.Thread.State: WAITING (on object monitor) 
>      at java.lang.Object.wait(Native Method)      at 
> java.lang.Thread.join(Thread.java:1245)      - locked <
> 0x00000000feb78000> (a org.apache.zookeeper.server.SyncRequestProcessor)      
> at java.lang.Thread.join(Thread.java:1319)      at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196)
>       at 
> org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90)
>       at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016)
>       at 
> org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78)
>       at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561)
>       - locked <
> 0x00000000feb61e20> (a 
> org.apache.zookeeper.server.quorum.LeaderZooKeeperServer)      at 
> org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169)
>       - locked <
> 0x00000000feb61e20> (a 
> org.apache.zookeeper.server.quorum.LeaderZooKeeperServer)      at 
> org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102)
>       - locked <
> 0x00000000feb61e20> (a 
> org.apache.zookeeper.server.quorum.LeaderZooKeeperServer)      at 
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637)      at 
> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590)      - locked 
> <
> 0x00000000feb781a0> (a org.apache.zookeeper.server.quorum.Leader)      at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108)
> "SyncThread:1" #46 prio=5 os_prio=0 tid=0x00007fbc5848f000 nid=0x867 waiting 
> for monitor entry [0x00007fbc4ca90000]      java.lang.Thread.State: BLOCKED 
> (on object monitor)      at 
> org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784)      - 
> waiting to lock <0x00000000feb781a0> (a 
> org.apache.zookeeper.server.quorum.Leader)      at 
> org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46)
>       at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183)
>       at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> {code}
> Leader.lead() calls shutdown() from the synchronized block, it acquired lock 
> on Leader.java instance
> {code}
> while (true) {
>                 synchronized (this) {
>                 long start = Time.currentElapsedTime();
>                               .....
> {code}
> In the shutdown flow SyncThread is trying to acquire lock on the same 
> Leader.java instance. 
> Leader thread acquired lock and waiting for SyncThread shutdown. SyncThread 
> waiting for the lock to complete its shutdown.  This is how ZooKeeper entered 
> into deadlock



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to