[ 
https://issues.apache.org/jira/browse/HBASE-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211831#comment-16211831
 ] 

Jerry He edited comment on HBASE-19021 at 10/19/17 10:09 PM:
-------------------------------------------------------------

More explanation.

In the branch-1 RegionStates.getAssignmentsByTable()
https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L1115
there is a part to deal with servers w/o assignments and draining mode.  This 
is missing after AMv2.

But the draining mode is actually ok after a 'detour' in AMv2.
The balancer's balanceCluster() can pick a plan to move regions to the draining 
servers. The regions will be 'unassigned'. But in the 'assign' phase, when 
going thru retainAssignment check, the plan is checked against the server list 
obtained from ServerManager.createDestinationServersList().  This list is a 
good list without the draining servers. So it is like a detour, but the end 
result is ok.
But I restored the branch-1 behavior, which is to take the draining servers out 
of consideration from the beginning.

The balancer's retainAssignment, randomAssignment and roundRobinAssignment all 
take a server list as parameter.  We seem to be always calling 
ServerManager.createDestinationServersList() to pass the server list. They are 
all good.  Only the big balanceCluster() call has the issue.


was (Author: jinghe):
More explanation.

In the branch-1 RegionStates.getAssignmentsByTable()
https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L1115
there is a part to deal with servers w/o assignments and draining mode.  This 
is missing after AMv2.

But the draining mode is actually ok after a 'detour' in AMv2.
The balancer's balanceCluster() can pick a plan to move regions to the draining 
servers. The regions will be 'unassigned'. But in the 'assign' phase, when 
going thru retainAssignment check, the plan is checked against the server list 
obtained from ServerManager.createDestinationServersList().  This list is a 
good list without the draining servers. So it is like a detour, but the end 
result is ok.
But I restored the branch-1 behavior, which is to take the draining servers out 
of consideration from the beginning.

The balancer's retainAssignment, randomAssignment and roundRobinAssignment all 
take a server list an parameters.  We seem to be always calling 
ServerManager.createDestinationServersList() to pass the server list. They are 
all good.  Only the big balanceCluster() call has the issue.

> Restore a few important missing logics for balancer in 2.0
> ----------------------------------------------------------
>
>                 Key: HBASE-19021
>                 URL: https://issues.apache.org/jira/browse/HBASE-19021
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Critical
>         Attachments: HBASE-19021-master.patch, HBASE-19021-master.patch
>
>
> After looking at the code, and some testing, I see the following things are 
> missing for balancer to work properly after AMv2.
> # hbase.master.loadbalance.bytable is not respected. It is always 'bytable'. 
> Previous default is cluster wide, not by table.
> # Servers with no assignments is not added for balance consideration.
> # Crashed server is not removed from the in-memory server map in 
> RegionStates, which affects balance.
> # Draining marker is not respected when balance.
> Also try to re-enable {{TestRegionRebalancing}}, which has a 
> {{testRebalanceOnRegionServerNumberChange}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to