[
https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261768#comment-16261768
]
stack commented on HBASE-18946:
-------------------------------
bq. While doing roundrobinAssignment contact the AM to know the current state
of replica regions and choose a server accordingly.
We only do this when it a region with replicas or do we do it always (would be
good if former, we want assignment to run fast).
Yeah, if round robin, its round robin (smile).
Please remind me what is the rule for replica assign? Just that they need to be
on different servers? Nothing about ordering? (Hmm... seems like replica has to
go out first). How does the patch to the balancer ensure this ordering?
is there a hole where you can't see an ongoing Assigment? It has been queue'd
and is being worked on but but you have no means of querying where a region is
being assigned (i.e. we are about to assign a replica and we want to avoid
assigning to the same location as where we just assigned?).
If round robin, are we not moving through the list of servers? Is the issue
only when cluster is small -- three servers or so?
On patch, don't renumber protobuf fields.
What is happening here (BTW, repeats code):
{code}
1263 List<RegionInfo> serverRegions =
1264 assignments.computeIfAbsent(serverName, k -> new
ArrayList<>());
1265 if (!RegionReplicaUtil.isDefaultReplica(region)) {
1266 if (!replicaAvailable(region, serverName)) {
1267 assignRegionToServer(cluster, serverName, serverRegions,
region);
1268 serverIdx = (j + serverIdx + 1) % numServers;
1269 assigned = true;
1270 break;
1271 }
1272 } else if (!cluster.wouldLowerAvailability(region, serverName))
{
1273 assignRegionToServer(cluster, serverName, serverRegions,
region);
1274 serverIdx = (j + serverIdx + 1) % numServers; // remain from
next server
...
{code}
If NOT isDefaultReplica and NOT replicaAvailable, we just fall through?
Good stuff.
> Stochastic load balancer assigns replica regions to the same RS
> ---------------------------------------------------------------
>
> Key: HBASE-18946
> URL: https://issues.apache.org/jira/browse/HBASE-18946
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0-alpha-3
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-18946.patch, HBASE-18946.patch,
> HBASE-18946_2.patch, HBASE-18946_2.patch,
> TestRegionReplicasWithRestartScenarios.java
>
>
> Trying out region replica and its assignment I can see that some times the
> default LB Stocahstic load balancer assigns replica regions to the same RS.
> This happens when we have 3 RS checked in and we have a table with 3
> replicas. When a RS goes down then the replicas being assigned to same RS is
> acceptable but the case when we have enough RS to assign this behaviour is
> undesirable and does not solve the purpose of replicas.
> [~huaxiang] and [~enis].
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)