[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514253#comment-17514253
 ] 

Christopher Tubbs commented on ZOOKEEPER-4504:
----------------------------------------------

Okay yeah, so I've read that, but sorry, it's still not entirely clear. From 
the description, it seems like this problem is either caused by a poorly 
written callback that synchronizes in a way it shouldn't. Or that this ZK 
delete function synchronizes on the callback instance when it shouldn't. I'm 
looking for clarity, as a user of ZK, on what the right thing to do is in 
general, and what should be avoided in order to avoid similar problems as this 
bug. The code and analysis of this specific manifestation is useful for the 
developers to troubleshoot the problem, but it's not very helpful for a user 
who is just trying to understand the high level view in order to avoid 
encountering a similar issue. I'm looking for the high level summary, so I know 
what to avoid as a user of the ZK API.

> ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality
> ----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4504
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4504
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Mohammad Arshad
>            Assignee: Mohammad Arshad
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Problem and Analysis:*
> After integrating ZooKeeper 3.6.3 we observed deadlock in HDFS HA 
> functionality as shown in below thread dumps.
> {code:java}
> "main-EventThread" #33 daemon prio=5 os_prio=0 tid=0x00007f9c017f1000 
> nid=0x101b waiting for monitor entry [0x00007f9bda8a6000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:603)
>       - waiting to lock <0x00000000c17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.process(ActiveStandbyElector.java:1193)
>       at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:626)
>       at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:582)
> {code}
> {code:java}
> "main" #1 prio=5 os_prio=0 tid=0x00007f9c00060000 nid=0xea3 waiting on 
> condition [0x00007f9c06404000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000000c1b383c8> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:999)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
>       at java.util.concurrent.Semaphore.acquire(Semaphore.java:467)
>       at org.apache.zookeeper.ZKUtil.deleteInBatch(ZKUtil.java:122)
>       at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:64)
>       at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:76)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:386)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:383)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1103)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
>       at 
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:383)
>       - locked <0x00000000c17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>       at 
> org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:290)
>       at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:227)
>       at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:66)
>       at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:186)
>       at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:182)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:360)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1741)
>       at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:498)
>       at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:182)
>       at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:220)
> {code}
> org.apache.hadoop.ha.ActiveStandbyElector#clearParentZNode is instance 
> synchronized and calls ZKUtil.deleteRecursive(zk, pathRoot)
> ZKUtil.deleteRecursive is async API call and in callback it is invoking 
> ActiveStandbyElector#processWatchEvent which is synchronized on 
> ActiveStandbyElector instance.
> So there is deadlock, clearParentZNode() is waiting processWatchEvent() to 
> complete and processWatchEvent() is waiting clearParentZNode to complete
>  
> *Why this problem was not happening with earlier versions (3.5.x)?*
> In earlier zk versions, ZKUtil.deleteRecursive was using sync zk API 
> intnernally. So there was no callback (processWatchEvent) coming into the 
> scenario.
> *Proposed Fix:*
> There are two approaches to fix this problem. 
> 1. We can fix the problem in HDFS, modify the HDFS code to avoid the 
> deadlock. But we may get similar bugs in other projects.
> 2. Fix the problem in ZK. Make the API behavior same as the old behavior(use 
> sync API to delete the ZK node) and provide new overloaded API with new 
> behavior(use async API to delete the ZK node)
> I propose to fix the problem with 2nd approach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to