[ https://issues.apache.org/jira/browse/HBASE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liwei updated HBASE-7259: ------------------------- Attachment: (was: HConnectionManager.patch) > Deadlock in HBaseClient when KeeperException occured > ---------------------------------------------------- > > Key: HBASE-7259 > URL: https://issues.apache.org/jira/browse/HBASE-7259 > Project: HBase > Issue Type: Bug > Components: Zookeeper > Affects Versions: 0.94.0, 0.94.1, 0.94.2 > Reporter: liwei > Priority: Critical > > HBaseClient was running after a period of time, all of get operation became > too slow. > From the client logs I could see the following: > 1. Unable to get data of znode /hbase/root-region-server > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:485) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1253) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1129) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:264) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:522) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:498) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.getData(ZooKeeperNodeTracker.java:156) > at > org.apache.hadoop.hbase.zookeeper.RootRegionTracker.getRootRegionLocation(RootRegionTracker.java:62) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:821) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:832) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:174) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) > at > org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:48) > at > org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:126) > at > org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:123) > at > org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:123) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:99) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:894) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:948) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:725) > at > org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:82) > at > org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:162) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:685) > at > org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:366) > 2. Catalina.out found one Java-level deadlock: > ============================= > "catalina-exec-800": > waiting to lock monitor 0x000000005f1f6530 (object 0x0000000731902200, a > java.lang.Object), > which is held by "catalina-exec-710" > "catalina-exec-710": > waiting to lock monitor 0x00002aaab9a05bd0 (object 0x00000007321f8708, a > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation), > which is held by "catalina-exec-29-EventThread" > "catalina-exec-29-EventThread": > waiting to lock monitor 0x000000005f9f0af0 (object 0x0000000732a9c7e0, a > org.apache.hadoop.hbase.zookeeper.RootRegionTracker), > which is held by "catalina-exec-710" > Java stack information for the threads listed above: > =================================================== > "catalina-exec-800": > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:943) > - waiting to lock <0x0000000731902200> (a java.lang.Object) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:807) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:725) > at > org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:82) > at > org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:162) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:685) > at > org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:366) > "catalina-exec-710": > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackers(HConnectionManager.java:599) > - waiting to lock <0x00000007321f8708> (a > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1660) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.getData(ZooKeeperNodeTracker.java:158) > - locked <0x0000000732a9c7e0> (a > org.apache.hadoop.hbase.zookeeper.RootRegionTracker) > at > org.apache.hadoop.hbase.zookeeper.RootRegionTracker.getRootRegionLocation(RootRegionTracker.java:62) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:821) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:832) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:174) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) > at > org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:48) > at > org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:126) > at > org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:123) > at > org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:123) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:99) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:894) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:948) > - locked <0x0000000731902200> (a java.lang.Object) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:807) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:725) > at > org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:82) > at > org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:162) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:685) > at > org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:366) > "catalina-exec-29-EventThread": > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.stop(ZooKeeperNodeTracker.java:98) > - waiting to lock <0x0000000732a9c7e0> (a > org.apache.hadoop.hbase.zookeeper.RootRegionTracker) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackers(HConnectionManager.java:604) > - locked <0x00000007321f8708> (a > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1660) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:374) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:271) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) > Found 1 deadlock. > From the source code , the reason for this problem is doing > ZooKeeperNodeTracker.getData that lead to KeeperException. And try to > resetZookeeperTracker. At the same time, ClientCnxn.EventThread also do > resetZookeeperTracker ,too. Because of getData have already held the lock of > ZooKeeperNodeTracke , that lead to the order of the lock two threads to > obtain does not accord. So deadlock happened. > In order to avoid the problem, we can add if reseting condition in > abortable.abort() > See the patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira