[ https://issues.apache.org/jira/browse/HBASE-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730875#comment-14730875 ]
Biju Nair commented on HBASE-14215: ----------------------------------- Primary replica distribution with {{hbase.master.balancer.stochastic.regionReplicaRackCostKey=0}} and {{hbase.master.balancer.stochastic.primaryRegionCountCost=500}} **Randomly assigned the primaries with balancer off** |r3n9 |112| |r1n10 |108| |r2n9 |116| |r2n10 |134| |r1n11 |119| |r2n11 |109| |r3n10 |119| |r3n11 |115| |r1n9 |95| **Primary distribution after balancer run** |r3n9 |114| |r1n10 |115| |r2n9 |114| |r2n10 |114| |r1n11 |114| |r2n11 |114| |r3n10 |114| |r1n9 |114| |r3n11 |114| As expected the primary replicas seem to get uniformly distributed even with a low cost multiplier for {{hbase.master.balancer.stochastic.primaryRegionCountCost}} when rack awareness is disabled. > Default cost used for PrimaryRegionCountSkewCostFunction is not sufficient > --------------------------------------------------------------------------- > > Key: HBASE-14215 > URL: https://issues.apache.org/jira/browse/HBASE-14215 > Project: HBase > Issue Type: Bug > Components: Balancer > Reporter: Biju Nair > Priority: Minor > Attachments: 14215-v1.txt > > > Current multiplier of 500 used in the stochastic balancer cost function > {{PrimaryRegionCountSkewCostFunction}} to calculate the cost of total > primary replication skew doesn't seem to be sufficient to prevent the skews > (Refer HBASE-14110). We would want the default cost to be a higher value so > that skews in primary region replica has higher cost. The following is the > test result by setting the multiplier value to 10000 (same as the region > replica rack cost multiplier) on a 3 Rack 9 RS node cluster which seems to > get the balancer distribute the primaries uniformly. > *Initial Primary replica distribution - using the current multiplier* > |r1n10| 102| > |r1n11| 85| > |r1n9| 88| > |r2n10| 120| > |r2n11| 120| > |r2n9| 124| > |r3n10| 135| > |r3n11| 124| > |r3n9| 129| > *After long duration of read & writes - using current multiplier* > | r1n10| 102| > | r1n11| 85| > | r1n9| 88| > | r2n10| 120| > | r2n11| 120| > | r2n9 | 124| > | r3n10| 135| > | r3n11| 124| > | r3n9| 129| > *After manual balancing* > | r1n10| 102| > | r1n11| 85| > | r1n9| 88| > | r2n10| 120| > | r2n11| 120| > | r2n9 | 124| > | r3n10| 135| > | r3n11| 124| > | r3n9| 129| > *Increased multiplier for primaryRegionCountSkewCost to 10000* > | r1n10| 114| > | r1n11 | 113| > | r1n9 | 114| > | r2n10| 114| > | r2n11| 114| > | r2n9 | 113| > | r3n10| 115| > | r3n11| 115| > | r3n9 | 115 | > Setting the {{PrimaryRegionCountSkewCostFunction}} multiplier value to 10000 > should help HBase general use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)