[ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625485#comment-13625485
 ] 

Varun Sharma commented on HBASE-8285:
-------------------------------------

@Ram,

Consider the retries:

1) Try #1

connect(retries != 0) is called which is the same as connect(reload=false). 
This is passed to relocateRegion() function, meaning that already available 
cache entries are utilized.

2) Try #2

connect(reload=true) is called, refreshing the cache correctly.

These retries come from hbase.client.retries (something like that) entry in the 
hbase-site.xml
This entry for most scenarios is greater than 1, so issue has not been widely 
detected. In our setup, we run with retries=1 since the retry logic is loaded 
into the app layer. Hence step #2 above never called and hence 
connect(reload=true) is also never called.

So, I think we need a check like this:

Check for (NSRE and that this is the 1st retry) (if (yes) relocate the specific 
region instead of relocating everything like Nicholas said)

Does this make sense ?
                
> HBaseClient never recovers for single HTable.get() calls with no retries when 
> regions move
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-8285
>                 URL: https://issues.apache.org/jira/browse/HBASE-8285
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.94.6.1
>            Reporter: Varun Sharma
>            Assignee: Varun Sharma
>            Priority: Critical
>             Fix For: 0.98.0, 0.94.7, 0.95.1
>
>         Attachments: 8285-0.94.txt, 8285-trunk.txt
>
>
> Steps to reproduce this bug:
> 1) Gracefull restart a region server causing regions to get redistributed.
> 2) Client call to this region keeps failing since Meta Cache is never purged 
> on the client for the region that moved.
> Reason behind the bug:
> 1) Client continues to hit the old region server.
> 2) The old region server throws NotServingRegionException which is not 
> handled correctly and the META cache entries are never purged for that server 
> causing the client to keep hitting the old server.
> The reason lies in ServerCallable code since we only purge META cache entries 
> when there is a RetriesExhaustedException, SocketTimeoutException or 
> ConnectException. However, there is no case check for 
> NotServingRegionException(s).
> Why is this not a problem for Scan(s) and Put(s) ?
> a) If a region server is not hosting a region/scanner, then an 
> UnknownScannerException is thrown which causes a relocateRegion() call 
> causing a refresh of the META cache for that particular region.
> b) For put(s), the processBatchCallback() interface in HConnectionManager is 
> used which clears out META cache entries for all kinds of exceptions except 
> DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to