[ 
https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539878#comment-17539878
 ] 

David Manning commented on HBASE-27054:
---------------------------------------

I see some good results by changing the cost function weights. I will propose a 
PR with those changes.
{code:java}
conf.setFloat("hbase.master.balancer.stochastic.moveCost", 0f);
conf.setFloat("hbase.master.balancer.stochastic.tableSkewCost", 0f); {code}
If I make one change, with {{maxRunningTime}} from 180s to 30s, I see 100% 
failure rate. If I make the above cost function weight updates, I see 100% pass 
rate, even with a {{maxRunningTime}} of 15s.

> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>  is flaky  
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-27054
>                 URL: https://issues.apache.org/jira/browse/HBASE-27054
>             Project: HBase
>          Issue Type: Test
>          Components: test
>    Affects Versions: 2.5.0
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-3
>
>
> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   . Looks like we can be off by one on either side of an expected value.
> Any idea what is going on here [~dmanning]? 
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.779 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no less than 60.
> server=srv1351292323,46522,-3543799643652531264 , load=59
>       at org.junit.Assert.fail(Assert.java:89)
>       at org.junit.Assert.assertTrue(Assert.java:42)
>       at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200)
>       at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>       at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>       at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.781 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no more than 60. 
> server=srv1402325691,7995,26308078476749652 , load=61
>       at org.junit.Assert.fail(Assert.java:89)
>       at org.junit.Assert.assertTrue(Assert.java:42)
>       at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198)
>       at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>       at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>       at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to