Jason: See the following test which revealed the deadlock scenario: https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java
On Jenkins, hbase build has been flaky where sometimes the above test hung but sometimes it passed. I tend to think that this bug should be fixed for production system. Cheers On Thu, Dec 17, 2015 at 3:33 PM, Jason Rosenberg <j...@squareup.com> wrote: > Curious if there are specific scenarios which trigger this issue. So far > we have not seen it where we've upgraded. We have many tests in continuous > integration that embed zookeeper servers, and so far haven't seen any > issues. > > Jason > > On Wed, Dec 16, 2015 at 6:01 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > Thanks, Flavio. > > > > When 3.4.8 RC comes out, I will give it a spin. > > > > Cheers > > > > On Wed, Dec 16, 2015 at 2:59 PM, Flavio Junqueira <f...@apache.org> > wrote: > > > > > This is bad, we should fix it and release 3.4.8 soon. With the holidays > > > and such, we won't be able to produce an RC and vote, so I suggest we > > > target early Jan. In the meanwhile, I'd suggest users to not move to > > 3.4.7. > > > > > > I've reopened ZK-1907 and suggested a fix to this problem. > > > > > > -Flavio > > > > > > > > > > On 16 Dec 2015, at 21:01, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > > Logged ZOOKEEPER-2347 > > > > > > > > Thanks > > > > > > > > On Wed, Dec 16, 2015 at 12:36 PM, Camille Fournier < > cami...@apache.org > > > > > > > wrote: > > > > > > > >> Blergh. We made shutdown synchronized. But decrementing the requests > > is > > > >> also synchronized and called from a different thread. So yeah, > > deadlock. > > > >> > > > >> Can you open a ticket for this? This came in with ZOOKEEPER-1907 > > > >> > > > >> C > > > >> > > > >> On Wed, Dec 16, 2015 at 2:46 PM, Ted Yu <yuzhih...@gmail.com> > wrote: > > > >> > > > >>> Hi, > > > >>> HBase recently upgraded to zookeeper 3.4.7 > > > >>> > > > >>> In one of the tests, TestSplitLogManager, there is reproducible > hang > > at > > > >> the > > > >>> end of the test. > > > >>> Below is snippet from stack trace related to zookeeper: > > > >>> > > > >>> "main-EventThread" daemon prio=5 tid=0x00007fd27488a800 nid=0x6f1f > > > >> waiting > > > >>> on condition [0x000000011834b000] > > > >>> java.lang.Thread.State: WAITING (parking) > > > >>> at sun.misc.Unsafe.park(Native Method) > > > >>> - parking to wait for <0x00000007c5b8d3a0> (a > > > >>> > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > > >>> at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > > > >>> at > > > >>> > > > >> > > > > > > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > > > >>> at > > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > > > >>> > > > >>> "main-SendThread(localhost:59510)" daemon prio=5 > > tid=0x00007fd274eb4000 > > > >>> nid=0x9513 waiting on condition [0x0000000118042000] > > > >>> java.lang.Thread.State: TIMED_WAITING (sleeping) > > > >>> at java.lang.Thread.sleep(Native Method) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997) > > > >>> at > > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > > > >>> > > > >>> "SyncThread:0" prio=5 tid=0x00007fd274d02000 nid=0x730f waiting for > > > >> monitor > > > >>> entry [0x00000001170ac000] > > > >>> java.lang.Thread.State: BLOCKED (on object monitor) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512) > > > >>> - waiting to lock <0x00000007c5b62128> (a > > > >>> org.apache.zookeeper.server.ZooKeeperServer) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) > > > >>> > > > >>> "main-EventThread" daemon prio=5 tid=0x00007fd2753a3800 nid=0x711b > > > >> waiting > > > >>> on condition [0x0000000117a30000] > > > >>> java.lang.Thread.State: WAITING (parking) > > > >>> at sun.misc.Unsafe.park(Native Method) > > > >>> - parking to wait for <0x00000007c9b106b8> (a > > > >>> > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > > >>> at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > > > >>> at > > > >>> > > > >> > > > > > > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > > > >>> at > > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > > > >>> > > > >>> "main" prio=5 tid=0x00007fd276000000 nid=0x1903 in Object.wait() > > > >>> [0x0000000108aa1000] > > > >>> java.lang.Thread.State: WAITING (on object monitor) > > > >>> at java.lang.Object.wait(Native Method) > > > >>> - waiting on <*0x00000007c5b66400*> (a > > > >>> org.apache.zookeeper.server.SyncRequestProcessor) > > > >>> at java.lang.Thread.join(Thread.java:1281) > > > >>> - locked <*0x00000007c5b66400*> (a > > > >>> org.apache.zookeeper.server.SyncRequestProcessor) > > > >>> at java.lang.Thread.join(Thread.java:1355) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478) > > > >>> - locked <0x00000007c5b62128> (a > > > >>> org.apache.zookeeper.server.ZooKeeperServer) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266) > > > >>> at > > > >>> > > > >>> > > > >> > > > > > > org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301) > > > >>> > > > >>> Note the bold address in the last hunk which seems to indicate some > > > form > > > >> of > > > >>> deadlock. > > > >>> > > > >>> I can send the full stack trace upon request. > > > >>> When reverting to 3.4.6, the test passes. > > > >>> > > > >>> Comment / hint is welcome. > > > >>> > > > >>> Cheers > > > >>> > > > >> > > > > > > > > >