[ 
https://issues.apache.org/jira/browse/HBASE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liwei updated HBASE-7259:
-------------------------

    Attachment:     (was: HConnectionManager.patch)
    
> Deadlock in HBaseClient when KeeperException occured
> ----------------------------------------------------
>
>                 Key: HBASE-7259
>                 URL: https://issues.apache.org/jira/browse/HBASE-7259
>             Project: HBase
>          Issue Type: Bug
>          Components: Zookeeper
>    Affects Versions: 0.94.0, 0.94.1, 0.94.2
>            Reporter: liwei
>            Priority: Critical
>
> HBaseClient was running after a period of time, all of get operation became 
> too slow.
> From the client logs I could see the following:
> 1. Unable to get data of znode /hbase/root-region-server
> java.lang.InterruptedException
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:485)
>         at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1253)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1129)
>         at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:264)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:522)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:498)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.getData(ZooKeeperNodeTracker.java:156)
>         at 
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker.getRootRegionLocation(RootRegionTracker.java:62)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:821)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:832)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
>         at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
>         at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:174)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:48)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:126)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:123)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:123)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:99)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:894)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:948)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:725)
>         at 
> org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:82)
>         at 
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:162)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:685)
>         at 
> org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:366)
> 2. Catalina.out found one Java-level deadlock:
> =============================
> "catalina-exec-800":
>   waiting to lock monitor 0x000000005f1f6530 (object 0x0000000731902200, a 
> java.lang.Object),
>   which is held by "catalina-exec-710"
> "catalina-exec-710":
>   waiting to lock monitor 0x00002aaab9a05bd0 (object 0x00000007321f8708, a 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation),
>   which is held by "catalina-exec-29-EventThread"
> "catalina-exec-29-EventThread":
>   waiting to lock monitor 0x000000005f9f0af0 (object 0x0000000732a9c7e0, a 
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker),
>   which is held by "catalina-exec-710"
> Java stack information for the threads listed above:
> ===================================================
> "catalina-exec-800":
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:943)
>         - waiting to lock <0x0000000731902200> (a java.lang.Object)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:807)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:725)
>         at 
> org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:82)
>         at 
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:162)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:685)
>         at 
> org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:366)
> "catalina-exec-710":
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackers(HConnectionManager.java:599)
>         - waiting to lock <0x00000007321f8708> (a 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1660)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.getData(ZooKeeperNodeTracker.java:158)
>         - locked <0x0000000732a9c7e0> (a 
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
>         at 
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker.getRootRegionLocation(RootRegionTracker.java:62)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:821)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:832)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
>         at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
>         at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:174)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:48)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:126)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:123)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:123)
>         at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:99)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:894)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:948)
>         - locked <0x0000000731902200> (a java.lang.Object)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:807)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:725)
>         at 
> org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:82)
>         at 
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:162)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:685)
>         at 
> org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:366)
> "catalina-exec-29-EventThread":
>         at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.stop(ZooKeeperNodeTracker.java:98)
>         - waiting to lock <0x0000000732a9c7e0> (a 
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackers(HConnectionManager.java:604)
>         - locked <0x00000007321f8708> (a 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1660)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:374)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:271)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
> Found 1 deadlock.
> From the source code , the reason for this problem is doing 
> ZooKeeperNodeTracker.getData that lead to KeeperException. And try to 
> resetZookeeperTracker. At the same time, ClientCnxn.EventThread  also do 
> resetZookeeperTracker ,too. Because of getData have already held the lock of  
> ZooKeeperNodeTracke , that lead to the order of the lock two threads to 
> obtain does not accord. So deadlock happened.
> In order to avoid the problem, we can add if reseting condition in 
> abortable.abort()
> See the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to