Mohammad Arshad created ZOOKEEPER-4504:
------------------------------------------

             Summary: ZKUtil#deleteRecursive causing deadlock in HDFS HA 
functionality
                 Key: ZOOKEEPER-4504
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4504
             Project: ZooKeeper
          Issue Type: Bug
            Reporter: Mohammad Arshad
            Assignee: Mohammad Arshad
             Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1


*Problem and Analysis:*
After integrating ZooKeeper 3.6.3 we observed deadlock in HDFS HA functionality 
as shown in below thread dumps.
{code:java}
"main-EventThread" #33 daemon prio=5 os_prio=0 tid=0x00007f9c017f1000 
nid=0x101b waiting for monitor entry [0x00007f9bda8a6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:603)
        - waiting to lock <0x00000000c17986c0> (a 
org.apache.hadoop.ha.ActiveStandbyElector)
        at 
org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.process(ActiveStandbyElector.java:1193)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:626)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:582)
{code}
{code:java}
"main" #1 prio=5 os_prio=0 tid=0x00007f9c00060000 nid=0xea3 waiting on 
condition [0x00007f9c06404000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c1b383c8> (a 
java.util.concurrent.Semaphore$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:999)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
        at java.util.concurrent.Semaphore.acquire(Semaphore.java:467)
        at org.apache.zookeeper.ZKUtil.deleteInBatch(ZKUtil.java:122)
        at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:64)
        at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:76)
        at 
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:386)
        at 
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:383)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1103)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:383)
        - locked <0x00000000c17986c0> (a 
org.apache.hadoop.ha.ActiveStandbyElector)
        at 
org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:290)
        at 
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:227)
        at 
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:66)
        at 
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:186)
        at 
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:182)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1741)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:498)
        at 
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:182)
        at 
org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:220)
{code}
org.apache.hadoop.ha.ActiveStandbyElector#clearParentZNode is instance 
synchronized and calls ZKUtil.deleteRecursive(zk, pathRoot)

ZKUtil.deleteRecursive is async API call and in callback it is invoking 
ActiveStandbyElector#processWatchEvent which is synchronized on 
ActiveStandbyElector instance.

So there is deadlock, clearParentZNode() is waiting processWatchEvent() to 
complete and processWatchEvent() is waiting clearParentZNode to complete

 

*Why this problem was not happening with earlier versions (3.5.x)?*

In earlier zk versions, ZKUtil.deleteRecursive was using sync zk API 
intnernally. So there was no callback (processWatchEvent) coming into the 
scenario.


*Proposed Fix:*
There are two approaches to fix this problem. 
1. We can fix the problem in HDFS, modify the HDFS code to avoid the deadlock. 
But we may get similar bugs in other projects.
2. Fix the problem in ZK. Make the API behavior same as the old behavior(use 
sync API to delete the ZK node) and provide new overloaded API with new 
behavior(use async API to delete the ZK node)

I propose to fix the problem with 2nd approach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to