[jira] [Commented] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down

huaxiang sun (JIRA) Tue, 06 Jun 2017 09:02:58 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039151#comment-16039151
 ]


huaxiang sun commented on HBASE-18005:
--------------------------------------

Thanks [[email protected]]. The failed unittest TestReplicasClient, I run 
locally with my patch on branch-1, it passed. The findbug warning is 
complaining 
{code}
RV      Return value of java.util.concurrent.CountDownLatch.await(long, 
TimeUnit) ignored in 
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper()

    if (this.initLatch != null) {
      this.initLatch.await(20, TimeUnit.SECONDS);
    }

{code}

This is the existing code which is not touched by this patch. We can check the 
return value and log something to get rid of the findbug warning.


> read replica: handle the case that region server hosting both primary replica 
> and meta region is down
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-18005
>                 URL: https://issues.apache.org/jira/browse/HBASE-18005
>             Project: HBase
>          Issue Type: Bug
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>         Attachments: HBASE-18005-branch-1-v001.patch, 
> HBASE-18005-master-001.patch, HBASE-18005-master-002.patch, 
> HBASE-18005-master-003.patch, HBASE-18005-master-004.patch, 
> HBASE-18005-master-005.patch, HBASE-18005-master-006.patch
>
>
> Identified one corner case in testing  that when the region server hosting 
> both primary replica and the meta region is down, the client tries to reload 
> the primary replica location from meta table, it is supposed to clean up only 
> the cached location for specific replicaId, but it clears caches for all 
> replicas. Please see
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813
> Since it takes some time for regions to be reassigned (including meta 
> region), the following may throw exception
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173
> This exception needs to be caught and  it needs to get cached location (in 
> this case, the primary replica's location is not available). If there are 
> cached locations for other replicas, it can still go ahead to get stale 
> values from secondary replicas.
> With meta replica, it still helps to not clean up the caches for all replicas 
> as the info from primary meta replica is up-to-date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down

Reply via email to