[ https://issues.apache.org/jira/browse/HBASE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931769#action_12931769 ]
Benoit Sigoure commented on HBASE-2121: --------------------------------------- Hey Gary, if you have a multi-threaded HBase app, I recommend you take a look at asynchbase (https://github.com/stumbleupon/asynchbase). It's an alternative HBase client that was designed to be thread-safe and non-blocking from the ground up. > HBase client doesn't retry the right number of times when a region is > unavailable > --------------------------------------------------------------------------------- > > Key: HBASE-2121 > URL: https://issues.apache.org/jira/browse/HBASE-2121 > Project: HBase > Issue Type: Bug > Components: client > Affects Versions: 0.20.2, 0.90.0 > Reporter: Benoit Sigoure > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries > retries 10 times (by default). It ends up calling > HConnectionManager$TableServers.locateRegionInMeta, which retries 10 times on > its own. So the HBase client is effectively retrying 100 times before giving > up, instead of 10 (10 is the default hbase.client.retries.number). > I'm using hbase trunk HEAD. I verified this bug is also in 0.20.2. > Sample call stack: > org.apache.hadoop.hbase.client.RegionOfflineException: region offline: > mytable,,1263421423787 > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:709) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:640) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:609) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:430) > at > org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57) > at > org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:62) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1047) > at > org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:836) > at > org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:756) > at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:354) > at <my application> > How to reproduce: > with a trivial HBase client (mine was just trying to scan the table), start > the client, take offline the table the client uses, tell the client to start > the scan. The client will not give up after 10 attempts, unlike what it's > supposed to do. > If locateRegionInMeta is only ever called from getRegionServerWithRetries, > then the fix is trivial: just remove the retry logic in there. If it has > some other callers who possibly relied on the retry logic in > locateRegionInMeta, then the fix is going to be a bit more involved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.