[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891155#comment-15891155
 ] 

Kahlil Oppenheimer commented on HBASE-17707:
--------------------------------------------

{code}
+    conf.setFloat("hbase.master.balancer.stochastic.tableSkewCost", 35);
{code}
I was trying to reset the config value for each test run, but I just added the 
config reset to the individual test.

{code}
+    conf.setFloat("hbase.master.balancer.stochastic.tableSkewCost", 0);
{code}
This value needs to be set low for this test (in my testing I found that values 
as high as 4 worked) because if it is too high, at some point TableSkew is more 
costly than having duplicate regions on the same server and 
org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertRegionReplicaPlacement(BalancerTestBase.java:362)
 fails

{code}
+    conf.setFloat(StochasticLoadBalancer.MIN_COST_NEED_BALANCE_KEY, 0.0f);
+    loadBalancer.setConf(conf);
{code}
The failing mock cluster is {code} new int[]{48, 53} {code}, which fails 
because the balancer decides to skip balancing because the mock cluster is not 
badly enough unbalanced (i.e. totalCost / sumMultiplier < .05). But then the 
test fails because the cluster doesn't get balanced. The log prints out 
"Skipping load balancing because balanced cluster; total cost is 23.5, sum 
multiplier is 1062.0 min cost which need balance is 0.05"

> New More Accurate TableSkew Balancer/Generator
> ----------------------------------------------
>
>                 Key: HBASE-17707
>                 URL: https://issues.apache.org/jira/browse/HBASE-17707
>             Project: HBase
>          Issue Type: New Feature
>          Components: Balancer
>    Affects Versions: 1.2.0
>         Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>            Reporter: Kahlil Oppenheimer
>            Priority: Minor
>              Labels: patch
>         Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to