[jira] [Commented] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down

huaxiang sun (JIRA) Wed, 10 May 2017 23:32:27 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005964#comment-16005964
 ]


huaxiang sun commented on HBASE-18005:
--------------------------------------

Hi [~leochen4891], thanks for bringing up the jira. Setting 
hbase.meta.replica.count to 2 or 3 can improve the case a lot but not 
completely. As I understand, meta table replication is still going through 
phase 1 approach (i,e, refresh the hfiles). In theory, after client goes to the 
meta replicas, it can get stale data (wrong region locations). Before the 
primary meta region is assigned, the client could run into get errors. I will 
try to change the unittest case with meta replica to see if I can reproduce the 
issue with meta replica.

> read replica: handle the case that region server hosting both primary replica 
> and meta region is down
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-18005
>                 URL: https://issues.apache.org/jira/browse/HBASE-18005
>             Project: HBase
>          Issue Type: Bug
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>         Attachments: HBASE-18005-master-001.patch
>
>
> Identified one corner case in testing  that when the region server hosting 
> both primary replica and the meta region is down, the client tries to reload 
> the primary replica location from meta table, it is supposed to clean up only 
> the cached location for specific replicaId, but it clears caches for all 
> replicas. Please see
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813
> Since it takes some time for regions to be reassigned (including meta 
> region), the following may throw exception
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173
> This exception needs to be caught and  it needs to get cached location (in 
> this case, the primary replica's location is not available). If there are 
> cached locations for other replicas, it can still go ahead to get stale 
> values from secondary replicas.
> With meta replica, it still helps to not clean up the caches for all replicas 
> as the info from primary meta replica is up-to-date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down

Reply via email to