[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS

huaxiang sun (JIRA) Thu, 19 Oct 2017 10:59:14 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211449#comment-16211449
 ]


huaxiang sun commented on HBASE-18946:
--------------------------------------

Thanks [~ram_krish]. One possible slowdown here with the approach is that if 
queueAll() queues more than assignDispatchWaitQueueMaxSize regions, with the 
current logic, it still needs to wait a bit, please see

https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1639.

The previous logic is that when the first region is queued, it starts to wait 
assignDispatchWaitMillis to start the real work. With the patch, the whole 
batch is added at once, it skipped the addFirstOne logic. I think it can be 
changed to avoid this case.

{code}
  private HashMap<RegionInfo, RegionStateNode> waitOnAssignQueue() {
    HashMap<RegionInfo, RegionStateNode> regions = null;

    assignQueueLock.lock();
    try {
      if (pendingAssignQueue.isEmpty() && isRunning()) {
        assignQueueFullCond.await();
      }

      if (!isRunning()) return null;
      +if (pendingAssignQueue.size() < assignDispatchWaitQueueMaxSize) {
      +  assignQueueFullCond.await(assignDispatchWaitMillis, 
TimeUnit.MILLISECONDS);
      +}
      -assignQueueFullCond.await(assignDispatchWaitMillis, 
TimeUnit.MILLISECONDS);
      regions = new HashMap<RegionInfo, 
RegionStateNode>(pendingAssignQueue.size());
      for (RegionStateNode regionNode: pendingAssignQueue) {
        regions.put(regionNode.getRegionInfo(), regionNode);
      }
      pendingAssignQueue.clear();
    } catch (InterruptedException e) {
      LOG.warn("got interrupted ", e);
      Thread.currentThread().interrupt();
    } finally {
      assignQueueLock.unlock();
    }
    return regions;
  }

{code}

> Stochastic load balancer assigns replica regions to the same RS
> ---------------------------------------------------------------
>
>                 Key: HBASE-18946
>                 URL: https://issues.apache.org/jira/browse/HBASE-18946
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha-3
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0-beta-1
>
>         Attachments: HBASE-18946.patch, HBASE-18946.patch, 
> TestRegionReplicasWithRestartScenarios.java
>
>
> Trying out region replica and its assignment I can see that some times the 
> default LB Stocahstic load balancer assigns replica regions to the same RS. 
> This happens when we have 3 RS checked in and we have a table with 3 
> replicas. When a RS goes down then the replicas being assigned to same RS is 
> acceptable but the case when we have enough RS to assign this behaviour is 
> undesirable and does not solve the purpose of replicas. 
> [~huaxiang] and [~enis]. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS

Reply via email to