[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345456#comment-15345456
 ] 

Michael Han commented on ZOOKEEPER-2080:
----------------------------------------

The root cause of FLE shutdown never returns is a deadlock introduced as part 
of ZOOKEEPER-107. The deadlock happens between WorkerReceiver thread of the 
Messenger in FastLeaderElection, and the Listener thread in QuorumCnxManager 
when FastLeaderElection requests restart a new leader election as part of 
dynamic reconfiguration change. An example:

# FastLeaderElection requests [restart leader 
election|https://github.com/apache/zookeeper/blob/ec056d3c3a18b862d0cd83296b7d4319652b0b1c/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L303].
 Note this block is synchronized on the QuorumPeer object self. 
# Restart leader election requires shut down existing QuorumCnxManager first, 
which requires [waiting for listener thread to finish 
execution|https://github.com/apache/zookeeper/blob/3c37184e83a3e68b73544cebccf9388eea26f523/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L539].
# At the same time, listener threads could be in a state where it is 
[initiating new connections 
out|https://github.com/apache/zookeeper/blob/3c37184e83a3e68b73544cebccf9388eea26f523/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L355].
# While at the previous state, listener thread could run into invocation of 
connectOne, which is [synchronized on the same QuorumPeer 
object|https://github.com/apache/zookeeper/blob/3c37184e83a3e68b73544cebccf9388eea26f523/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L475]
 that FLE shutdown acquired earlier.
# As a result, FastLeaderElection is waiting for Listener thread to finish 
while listener thread is waiting for FLE to release the intrinsic lock on 
QuorumPeer, thus deadlock.

The code path that triggers the deadlock is introduced in ZOOKEEPER-107, so 
this issue only impacts 3.5 and not 3.4. I am attaching a patch that fixes the 
issue by specifying a timeout value when join listener thread. I am not super 
satisfied with this fix as relying on timeout is fragile, but it does fix the 
problem (validated all tests passed with my endurance test suites), and the 
side effect of bailing out seems trivial as the listener threads is going to 
die anyway and bail out does not cause leaking any resources.

I am going to dig deeper into reconfig logic see if there is way to fix the 
deadlock which is better than bail out on listener's side. Meanwhile this 
harmless patch is ready to go in if we need a quick / dirty way of fixing the 
problem. 

Also attach a thread dump that indicates the dead lock situation.

> ReconfigRecoveryTest fails intermittently
> -----------------------------------------
>
>                 Key: ZOOKEEPER-2080
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
>             Project: ZooKeeper
>          Issue Type: Sub-task
>            Reporter: Ted Yu
>            Assignee: Michael Han
>         Attachments: jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z, 
> repro-20150816.log
>
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
>   FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
>   at 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to