[ 
https://issues.apache.org/jira/browse/HBASE-25635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295991#comment-17295991
 ] 

Xiaolin Ha commented on HBASE-25635:
------------------------------------

Hi, [~vjasani]. I have found the root cause of failed cases in 
TestStochasticLoadBalancerHeterogeneousCost with this patch.

The reason is that, in TestStochasticLoadBalancerHeterogeneousCost, it used the 
only cost function `HeterogeneousRegionCountCostFunction`, and this function 
leads the cluster to distribute regions follows a fixed region distribute rate, 
eg, capacity of Server1 is 200 regions, capacity of server2 is 50 regions, 
total regions count is 100, so after balance, server1 should contain 
100*(200/(200+50)), while server2 should contain 100*(50/(200+50)). But origin 
RandomCandidateGenerator chooses regions consider RS region count, when 
server1>server2, it will generate MOVE action from server1 to server2  instead 
of MOVE action from server2 to server1. So mostly actions will be undo, test 
logs are as follows:

!1614940188041-image.png|width=762,height=293!

In CandidateGenerator#getAction(), it generates MOVE actions when fromRegion==0 
OR toRegion==0. Potentially convert some SWAP actions to MOVE actions. 

And the balance steps count is 4000000 in the test case, as a result, made the 
test case pass.

I have fixed the problem in the PR. Could you help to review it? Thanks.

 

> CandidateGenerator may miss some region balance actions
> -------------------------------------------------------
>
>                 Key: HBASE-25635
>                 URL: https://issues.apache.org/jira/browse/HBASE-25635
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>         Attachments: 1614940188041-image.png
>
>
> {color:#172b4d}In codes of CandidateGenerator#getAction(),valid region index 
> are greater or equal to zero. Currently only regions with greater than zero 
> indexes can get balance actions.{color}
> {code:java}
> protected BaseLoadBalancer.Cluster.Action getAction(int fromServer, int 
> fromRegion,
>     int toServer, int toRegion) {
>   if (fromServer < 0 || toServer < 0) {
>     return BaseLoadBalancer.Cluster.NullAction;
>   }
>   if (fromRegion > 0 && toRegion > 0) {
>     return new BaseLoadBalancer.Cluster.SwapRegionsAction(fromServer, 
> fromRegion,
>       toServer, toRegion);
>   } else if (fromRegion > 0) {
>     return new BaseLoadBalancer.Cluster.MoveRegionAction(fromRegion, 
> fromServer, toServer);
>   } else if (toRegion > 0) {
>     return new BaseLoadBalancer.Cluster.MoveRegionAction(toRegion, toServer, 
> fromServer);
>   } else {
>     return BaseLoadBalancer.Cluster.NullAction;
>   }
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to