[
https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539878#comment-17539878
]
David Manning commented on HBASE-27054:
---------------------------------------
I see some good results by changing the cost function weights. I will propose a
PR with those changes.
{code:java}
conf.setFloat("hbase.master.balancer.stochastic.moveCost", 0f);
conf.setFloat("hbase.master.balancer.stochastic.tableSkewCost", 0f); {code}
If I make one change, with {{maxRunningTime}} from 180s to 30s, I see 100%
failure rate. If I make the above cost function weight updates, I see 100% pass
rate, even with a {{maxRunningTime}} of 15s.
> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
> is flaky
> -----------------------------------------------------------------------------------------------
>
> Key: HBASE-27054
> URL: https://issues.apache.org/jira/browse/HBASE-27054
> Project: HBase
> Issue Type: Test
> Components: test
> Affects Versions: 2.5.0
> Reporter: Andrew Kyle Purtell
> Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3
>
>
> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
> . Looks like we can be off by one on either side of an expected value.
> Any idea what is going on here [~dmanning]?
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
> Time elapsed: 77.779 s <<< FAILURE!
> java.lang.AssertionError: All servers should have load no less than 60.
> server=srv1351292323,46522,-3543799643652531264 , load=59
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.assertTrue(Assert.java:42)
> at
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200)
> at
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
> at
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
> at
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
> Time elapsed: 77.781 s <<< FAILURE!
> java.lang.AssertionError: All servers should have load no more than 60.
> server=srv1402325691,7995,26308078476749652 , load=61
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.assertTrue(Assert.java:42)
> at
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198)
> at
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
> at
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
> at
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)