[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064972#comment-15064972
 ] 

Flavio Junqueira commented on ZOOKEEPER-2347:
---------------------------------------------

The patch looks good, but I'm not really convinced about the test case. It 
relies on the interleaving of events to possibly trigger the problem, so it 
isn't deterministically reproducing the problem in the case it exists. I was 
thinking that maybe a better way of testing this is to set up a pipeline, 
populate {{toFlush}} directly, and just call shutdown on the 
{{ZooKeeperServer}}. If it is possible to do this, then it will be more 
reliable than submitting a bunch of operations and hoping for the race to kick 
in. What do you think? 

> Deadlock shutting down zookeeper
> --------------------------------
>
>                 Key: ZOOKEEPER-2347
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.7
>            Reporter: Ted Yu
>            Assignee: Rakesh R
>            Priority: Blocker
>             Fix For: 3.4.8
>
>         Attachments: ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x00007fd27488a800 nid=0x6f1f waiting on 
> condition [0x000000011834b000]
>    java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00000007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x00007fd274eb4000 
> nid=0x9513 waiting on condition [0x0000000118042000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x00007fd274d02000 nid=0x730f waiting for monitor 
> entry [0x00000001170ac000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x00000007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x00007fd2753a3800 nid=0x711b waiting on 
> condition [0x0000000117a30000]
>    java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00000007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x00007fd276000000 nid=0x1903 in Object.wait() 
> [0x0000000108aa1000]
>    java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x00000007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x00000007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x00000007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x00000007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to