[ 
https://issues.apache.org/jira/browse/HBASE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931632#action_12931632
 ] 

Gary Gilbert commented on HBASE-2121:
-------------------------------------

Not only does it not retry the right number of times,but there is another 
annoying side-effect.  When the inner loop completes, it sleeps.  However the 
inner loop is entirely within a synchronized section.  My application had 
multiple HTable objects in different threads and they waited serially for each 
of them to finally give up.  As each retry was on the order of an hour, the 
application didn't completely fail until all 5 HTable objects had reported the 
same error, 5 hours later.  The inner loop ought to return the failure outside 
the synchronized section and be re-driven from above.  Then each of the threads 
would equally be in midst of retrying.  Granted this was mostly just annoying, 
but it made it difficult to kill the application in a friendly way. 

> HBase client doesn't retry the right number of times when a region is 
> unavailable
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2121
>                 URL: https://issues.apache.org/jira/browse/HBASE-2121
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.20.2, 0.90.0
>            Reporter: Benoit Sigoure
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries
>  retries 10 times (by default).   It ends up calling 
> HConnectionManager$TableServers.locateRegionInMeta, which retries 10 times on 
> its own.  So the HBase client is effectively retrying 100 times before giving 
> up, instead of 10 (10 is the default hbase.client.retries.number).
> I'm using hbase trunk HEAD.  I verified this bug is also in 0.20.2.
> Sample call stack:
>  org.apache.hadoop.hbase.client.RegionOfflineException: region offline: 
> mytable,,1263421423787
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:709)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:640)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:609)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:430)
>       at 
> org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:62)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1047)
>       at 
> org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:836)
>       at 
> org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:756)
>       at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:354)
>       at <my application>
> How to reproduce:
> with a trivial HBase client (mine was just trying to scan the table), start 
> the client, take offline the table the client uses, tell the client to start 
> the scan.  The client will not give up after 10 attempts, unlike what it's 
> supposed to do.
> If locateRegionInMeta is only ever called from getRegionServerWithRetries, 
> then the fix is trivial: just remove the retry logic in there.  If it has 
> some other callers who possibly relied on the retry logic in 
> locateRegionInMeta, then the fix is going to be a bit more involved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to