[
https://issues.apache.org/jira/browse/HBASE-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211831#comment-16211831
]
Jerry He commented on HBASE-19021:
----------------------------------
More explanation.
In the branch-1 RegionStates.getAssignmentsByTable()
https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L1115
there is a part to deal with servers w/o assignments and draining mode. This
is missing after AMv2.
But the draining mode is actually ok after a 'detour' in AMv2.
The balancer's balanceCluster() can pick a plan to move regions to the draining
servers. The regions will be 'unassigned'. But in the 'assign' phase, when
going thru retainAssignment check, the plan is checked against the server list
obtained from ServerManager.createDestinationServersList(). This list is a
good list without the draining servers. So it is like a detour, but the end
result is ok.
But I restored the branch-1 behavior, which is to take the draining servers out
of consideration from the beginning.
The balancer's retainAssignment, randomAssignment and roundRobinAssignment all
take a server list an parameters. We seem to be always calling
ServerManager.createDestinationServersList() to pass the server list. They are
all good. Only the big balanceCluster() call has the issue.
> Restore a few important missing logics for balancer in 2.0
> ----------------------------------------------------------
>
> Key: HBASE-19021
> URL: https://issues.apache.org/jira/browse/HBASE-19021
> Project: HBase
> Issue Type: Bug
> Reporter: Jerry He
> Assignee: Jerry He
> Priority: Critical
> Attachments: HBASE-19021-master.patch
>
>
> After looking at the code, and some testing, I see the following things are
> missing for balancer to work properly after AMv2.
> # hbase.master.loadbalance.bytable is not respected. It is always 'bytable'.
> Previous default is cluster wide, not by table.
> # Servers with no assignments is not added for balance consideration.
> # Crashed server is not removed from the in-memory server map in
> RegionStates, which affects balance.
> # Draining marker is not respected when balance.
> Also try to re-enable {{TestRegionRebalancing}}, which has a
> {{testRebalanceOnRegionServerNumberChange}}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)