[ https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539878#comment-17539878 ]
David Manning commented on HBASE-27054: --------------------------------------- I see some good results by changing the cost function weights. I will propose a PR with those changes. {code:java} conf.setFloat("hbase.master.balancer.stochastic.moveCost", 0f); conf.setFloat("hbase.master.balancer.stochastic.tableSkewCost", 0f); {code} If I make one change, with {{maxRunningTime}} from 180s to 30s, I see 100% failure rate. If I make the above cost function weight updates, I see 100% pass rate, even with a {{maxRunningTime}} of 15s. > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > is flaky > ----------------------------------------------------------------------------------------------- > > Key: HBASE-27054 > URL: https://issues.apache.org/jira/browse/HBASE-27054 > Project: HBase > Issue Type: Test > Components: test > Affects Versions: 2.5.0 > Reporter: Andrew Kyle Purtell > Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3 > > > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > . Looks like we can be off by one on either side of an expected value. > Any idea what is going on here [~dmanning]? > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.779 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no less than 60. > server=srv1351292323,46522,-3543799643652531264 , load=59 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.781 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no more than 60. > server=srv1402325691,7995,26308078476749652 , load=61 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)