[
https://issues.apache.org/jira/browse/HBASE-25635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295991#comment-17295991
]
Xiaolin Ha commented on HBASE-25635:
------------------------------------
Hi, [~vjasani]. I have found the root cause of failed cases in
TestStochasticLoadBalancerHeterogeneousCost with this patch.
The reason is that, in TestStochasticLoadBalancerHeterogeneousCost, it used the
only cost function `HeterogeneousRegionCountCostFunction`, and this function
leads the cluster to distribute regions follows a fixed region distribute rate,
eg, capacity of Server1 is 200 regions, capacity of server2 is 50 regions,
total regions count is 100, so after balance, server1 should contain
100*(200/(200+50)), while server2 should contain 100*(50/(200+50)). But origin
RandomCandidateGenerator chooses regions consider RS region count, when
server1>server2, it will generate MOVE action from server1 to server2 instead
of MOVE action from server2 to server1. So mostly actions will be undo, test
logs are as follows:
!1614940188041-image.png|width=762,height=293!
In CandidateGenerator#getAction(), it generates MOVE actions when fromRegion==0
OR toRegion==0. Potentially convert some SWAP actions to MOVE actions.
And the balance steps count is 4000000 in the test case, as a result, made the
test case pass.
I have fixed the problem in the PR. Could you help to review it? Thanks.
> CandidateGenerator may miss some region balance actions
> -------------------------------------------------------
>
> Key: HBASE-25635
> URL: https://issues.apache.org/jira/browse/HBASE-25635
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.0.0
> Reporter: Xiaolin Ha
> Assignee: Xiaolin Ha
> Priority: Major
> Attachments: 1614940188041-image.png
>
>
> {color:#172b4d}In codes of CandidateGenerator#getAction(),valid region index
> are greater or equal to zero. Currently only regions with greater than zero
> indexes can get balance actions.{color}
> {code:java}
> protected BaseLoadBalancer.Cluster.Action getAction(int fromServer, int
> fromRegion,
> int toServer, int toRegion) {
> if (fromServer < 0 || toServer < 0) {
> return BaseLoadBalancer.Cluster.NullAction;
> }
> if (fromRegion > 0 && toRegion > 0) {
> return new BaseLoadBalancer.Cluster.SwapRegionsAction(fromServer,
> fromRegion,
> toServer, toRegion);
> } else if (fromRegion > 0) {
> return new BaseLoadBalancer.Cluster.MoveRegionAction(fromRegion,
> fromServer, toServer);
> } else if (toRegion > 0) {
> return new BaseLoadBalancer.Cluster.MoveRegionAction(toRegion, toServer,
> fromServer);
> } else {
> return BaseLoadBalancer.Cluster.NullAction;
> }
> }{code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)