[jira] [Updated] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS

ramkrishna.s.vasudevan (JIRA) Mon, 20 Nov 2017 04:15:07 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ramkrishna.s.vasudevan updated HBASE-18946:
-------------------------------------------
    Attachment: HBASE-18946_2.patch

New patch for assigning replicas
-> Splits the create table procedure with one more step where if the table has 
replicas assign them with a new CreateTableStep. In order to track the current 
set of replicas being assigned add a new state and serialize them.
-> While doing roundrobinAssignment contact the AM to know the current state of 
replica regions and choose a server accordingly. 
Note that even in earlier branches the round robin assignment was never talking 
with the LB to know the region balancing. It was just doing a round robin and 
that was good enough because all the region that needs to be assigned was going 
as a batch. So say if there are 30 regions and 3 RS the round robin was going 
about fine. Now in trunk that was not happening because Assign procedures were 
going asynchronously. Though every time the round robin asks for the current 
Cluster State that cluster state is based on the current set of regions that 
are in pipeline to be assigned and not the state of the current AM.
Now I am very sure the same change has to be done in ServerCrashProcedure to 
enable the failed tests becuase there also we just go with round robin and the 
flow will not contact the LB.
There is one comment in the current BaseLB code
{code}
// TODO: instead of retainAssignment() and roundRobinAssignment(), we should 
just run the
    // normal LB.balancerCluster() with unassignedRegions.
{code}
I think this change has to be done along with the way in the current patch.
I ran this test in the patch 25 times in a loop and the assignment was done 
without any issues. Once this patch is done I will make the change for 
ServerCrashProcedure also. Thanks.


> Stochastic load balancer assigns replica regions to the same RS
> ---------------------------------------------------------------
>
>                 Key: HBASE-18946
>                 URL: https://issues.apache.org/jira/browse/HBASE-18946
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha-3
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0-beta-1
>
>         Attachments: HBASE-18946.patch, HBASE-18946.patch, 
> HBASE-18946_2.patch, TestRegionReplicasWithRestartScenarios.java
>
>
> Trying out region replica and its assignment I can see that some times the 
> default LB Stocahstic load balancer assigns replica regions to the same RS. 
> This happens when we have 3 RS checked in and we have a table with 3 
> replicas. When a RS goes down then the replicas being assigned to same RS is 
> acceptable but the case when we have enough RS to assign this behaviour is 
> undesirable and does not solve the purpose of replicas. 
> [~huaxiang] and [~enis]. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS

Reply via email to