[
https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ramkrishna.s.vasudevan updated HBASE-18946:
-------------------------------------------
Attachment: HBASE-18946_2.patch
New patch for assigning replicas
-> Splits the create table procedure with one more step where if the table has
replicas assign them with a new CreateTableStep. In order to track the current
set of replicas being assigned add a new state and serialize them.
-> While doing roundrobinAssignment contact the AM to know the current state of
replica regions and choose a server accordingly.
Note that even in earlier branches the round robin assignment was never talking
with the LB to know the region balancing. It was just doing a round robin and
that was good enough because all the region that needs to be assigned was going
as a batch. So say if there are 30 regions and 3 RS the round robin was going
about fine. Now in trunk that was not happening because Assign procedures were
going asynchronously. Though every time the round robin asks for the current
Cluster State that cluster state is based on the current set of regions that
are in pipeline to be assigned and not the state of the current AM.
Now I am very sure the same change has to be done in ServerCrashProcedure to
enable the failed tests becuase there also we just go with round robin and the
flow will not contact the LB.
There is one comment in the current BaseLB code
{code}
// TODO: instead of retainAssignment() and roundRobinAssignment(), we should
just run the
// normal LB.balancerCluster() with unassignedRegions.
{code}
I think this change has to be done along with the way in the current patch.
I ran this test in the patch 25 times in a loop and the assignment was done
without any issues. Once this patch is done I will make the change for
ServerCrashProcedure also. Thanks.
> Stochastic load balancer assigns replica regions to the same RS
> ---------------------------------------------------------------
>
> Key: HBASE-18946
> URL: https://issues.apache.org/jira/browse/HBASE-18946
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0-alpha-3
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-18946.patch, HBASE-18946.patch,
> HBASE-18946_2.patch, TestRegionReplicasWithRestartScenarios.java
>
>
> Trying out region replica and its assignment I can see that some times the
> default LB Stocahstic load balancer assigns replica regions to the same RS.
> This happens when we have 3 RS checked in and we have a table with 3
> replicas. When a RS goes down then the replicas being assigned to same RS is
> acceptable but the case when we have enough RS to assign this behaviour is
> undesirable and does not solve the purpose of replicas.
> [~huaxiang] and [~enis].
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)