[ 
https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193395#comment-16193395
 ] 

huaxiang sun commented on HBASE-18946:
--------------------------------------

Thanks [~ram_krish] for the finding! I checked the code, I think it is caused 
by the fact replica regions are added first, then all default regions are added 
at the end of the list.

https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionReplicaUtil.java#L187

For example, 3 RSes and two regions with 3 replicas.

{code}
regionId-replicaId
0-1    0-2   1-1    1-2   0-0   1-0

R0           R1              R2
0-1         0-2               1-1
1-2         0-0               1-0
{code}

We can see that for R1 and R2, replicas for same regions are assigned to the 
same RS.

If the logic can be changed a bit as follows, it can fix this issue. Other 
places need to be checked as well.
{code}

  public static List<RegionInfo> addReplicas(final TableDescriptor 
tableDescriptor,
      final List<RegionInfo> regions, int oldReplicaCount, int newReplicaCount) 
{
    if ((newReplicaCount - 1) <= 0) {
      return regions;
    }
    List<RegionInfo> hRegionInfos = new ArrayList<>((newReplicaCount) * 
regions.size());
    for (int i = 0; i < regions.size(); i++) {
      if (RegionReplicaUtil.isDefaultReplica(regions.get(i))) {
        // region level replica index starts from 0. So if oldReplicaCount was 
2 then the max replicaId for
        // the existing regions would be 1
        hRegionInfos.add(regions.get(i));
        for (int j = oldReplicaCount; j < newReplicaCount; j++) {
          
hRegionInfos.add(RegionReplicaUtil.getRegionInfoForReplica(regions.get(i), j));
        }
      }
    }
   // hRegionInfos.addAll(regions);
{code}


> Stochastic load balancer assigns replica regions to the same RS
> ---------------------------------------------------------------
>
>                 Key: HBASE-18946
>                 URL: https://issues.apache.org/jira/browse/HBASE-18946
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha-3
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0-beta-1
>
>         Attachments: TestRegionReplicasWithRestartScenarios.java
>
>
> Trying out region replica and its assignment I can see that some times the 
> default LB Stocahstic load balancer assigns replica regions to the same RS. 
> This happens when we have 3 RS checked in and we have a table with 3 
> replicas. When a RS goes down then the replicas being assigned to same RS is 
> acceptable but the case when we have enough RS to assign this behaviour is 
> undesirable and does not solve the purpose of replicas. 
> [~huaxiang] and [~enis]. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to