[
https://issues.apache.org/jira/browse/HBASE-26147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389028#comment-17389028
]
Bryan Beaudreault commented on HBASE-26147:
-------------------------------------------
Here's an example of running the balancer in dry run mode using the hbase shell.
*{{Initial status -- cluster is well balanced}}*
{{hbase:003:0> status 'simple'}}
{{active master: master1:60000 1627497753352}}
{{1 backup masters}}
{{ master2:60000 1627497808648}}
{{4 live servers}}
{{ regionserver1:60020 1627500965175}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=3}}
{{ regionserver2:60020 1627498037868}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{ regionserver3:60020 1627498045321}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=4, snip...}}
{{ regionserver4:60020 1627498059869}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{0 dead servers}}
{{Aggregate load: 0, regions: 13}}
{{Took 0.0667 seconds}}
*{{Turn off balancer so it doesn't mess with test}}*
{{hbase:004:0> balance_switch 'false'}}
{{Previous balancer state : true}}
{{Took 0.0208 seconds}}
{{=> true}}
*{{Force an imbalance on the cluster}}*
{{hbase:005:0> move '95c3921b4757077f16b1804278222072',
'regionserver2,60020,1627498037868'}}
{{Took 1.0711 seconds}}
{{hbase:006:0> move '45db520388ef4f427ba4718e3103585e',
'regionserver2,60020,1627498037868'}}
{{Took 1.0316 seconds}}
{{hbase:007:0> move '75ff1c829866bee6cced10e2b603a1cd',
'regionserver2,60020,1627498037868'}}
{{Took 1.0398 seconds}}
*{{Check status -- regionserver2 has too many regions}}*
{{hbase:008:0> status 'simple'}}
{{active master: master1:60000 1627497753352}}
{{1 backup masters}}
{{ master2:60000 1627497808648}}
{{4 live servers}}
{{ regionserver1:60020 1627500965175}}
{{ requestsPerSecond=1.0, numberOfOnlineRegions=3, snip...}}
{{ regionserver2:60020 1627498037868}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=6, snip...}}
{{ regionserver3:60020 1627498045321}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{ regionserver4:60020 1627498059869}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=1, snip...}}
{{0 dead servers}}
{{Aggregate load: 1, regions: 13}}
{{Took 0.0110 seconds}}
*{{Run balancer in dry run mode}}*
{{hbase:009:0> dry_run_balancer}}
{{true}}
{{Took 0.1302 seconds}}
{{=> true}}
*{{Check logs on active master, balancer would have moved 2 regions}}*
{{2021-07-28 20:01:27,254 INFO
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer: Start Generate
Balance plan for cluster.}}
{{2021-07-28 20:01:27,257 DEBUG
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer:
RegionCountSkewCostFunction sees a total of 4 servers and 13 regions.}}
{{2021-07-28 20:01:27,257 DEBUG
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: We need to load
balance cluster; total cost=141.75213675213675, sum multiplier=685.0;
cost/multiplier to need a balance is 0.05}}
{{2021-07-28 20:01:27,258 INFO
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: start
StochasticLoadBalancer.balancer, initCost=141.75213675213675,
functionCost=RegionCountSkewCostFunction : (500.0, 0.2222222222222222);
PrimaryRegionCountSkewCostFunction : (500.0, 0.0); MoveCostFunction :}}
{{ (100.0, 0.0); ServerLocalityCostFunction : (25.0, 0.0);
TableSkewCostFunction : (35.0, 0.5897435897435898);
RegionReplicaHostCostFunction : (100000.0, 0.0); RegionReplicaRackCostFunction
: (10000.0, 0.0); ReadRequestCostFunction : (10.0, 1.0);
WriteRequestCostFunction : (5.0, 0.0); MemStoreSizeCostFunction : (5.0, 0.0);
StoreFileCostFunction : (5.0, 0.0); comp}}
{{utedMaxSteps: 41600}}
{{2021-07-28 20:01:27,380 INFO
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished
computing new load balance plan. Computation took PT0.122S to try 41600
different iterations. Found a solution that moves 2 regions; Going from a
computed cost of 141.75213675213675 to a new cost of}}
{{38.84615384615385}}
*{{Check status -- cluster is still imbalanced, dry run mode took no actions}}*
{{hbase:010:0> status 'simple'}}
{{active master: master1:60000 1627497753352}}
{{1 backup masters}}
{{ master2:60000 1627497808648}}
{{4 live servers}}
{{ regionserver1:60020 1627500965175}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{ regionserver2:60020 1627498037868}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=6, snip...}}
{{ regionserver3:60020 1627498045321}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{ regionserver4:60020 1627498059869}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=1, snip...}}
{{0 dead servers}}
{{Aggregate load: 0, regions: 13}}
{{Took 0.0135 seconds}}
*{{Re-enable balancer}}*
{{hbase:011:0> balance_switch 'true'}}
{{Previous balancer state : false}}
{{Took 0.0095 seconds}}
{{=> false}}
*{{Balance the cluster}}*
{{hbase:012:0> balancer}}
{{true}}
{{Took 1.3375 seconds}}
{{=> true}}
*{{Check logs on active master, balancer calculated a similar plan but this
time moved regions}}*
{{2021-07-28 20:01:53,375 INFO
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer: Start Generate
Balance plan for cluster.}}
{{2021-07-28 20:01:53,375 DEBUG
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer:
RegionCountSkewCostFunction sees a total of 4 servers and 13 regions.}}
{{2021-07-28 20:01:53,375 DEBUG
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: We need to load
balance cluster; total cost=141.75213675213675, sum multiplier=685.0;
cost/multiplier to need a balance is 0.05}}
{{2021-07-28 20:01:53,376 INFO
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: start
StochasticLoadBalancer.balancer, initCost=141.75213675213675,
functionCost=RegionCountSkewCostFunction : (500.0, 0.2222222222222222);
PrimaryRegionCountSkewCostFunction : (500.0, 0.0); MoveCostFunction :}}
{{ (100.0, 0.0); ServerLocalityCostFunction : (25.0, 0.0);
TableSkewCostFunction : (35.0, 0.5897435897435898);
RegionReplicaHostCostFunction : (100000.0, 0.0); RegionReplicaRackCostFunction
: (10000.0, 0.0); ReadRequestCostFunction : (10.0, 1.0);
WriteRequestCostFunction : (5.0, 0.0); MemStoreSizeCostFunction : (5.0, 0.0);
StoreFileCostFunction : (5.0, 0.0); comp}}
{{utedMaxSteps: 41600}}
{{2021-07-28 20:01:53,499 INFO
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished
computing new load balance plan. Computation took PT0.124S to try 41600
different iterations. Found a solution that moves 2 regions; Going from a
computed cost of 141.75213675213675 to a new cost of}}
{{38.84615384615385}}
{{2021-07-28 20:01:53,499 INFO org.apache.hadoop.hbase.master.HMaster: Balancer
plans size is 2, the balance interval is 90000 ms, and the max number regions
in transition is 13}}
{{2021-07-28 20:01:53,499 INFO org.apache.hadoop.hbase.master.HMaster: balance
hri=740b4fdcbc4c68c37c80b98e10be5726, source=regionserver2,60020,1627498037868,
destination=regionserver4,60020,1627498059869}}
{{... snip remaining log spam ...}}
*{{Final status -- cluster is re-balanced}}*
{{hbase:013:0> status 'simple'}}
{{active master: master1:60000 1627497753352}}
{{1 backup masters}}
{{ master2:60000 1627497808648}}
{{4 live servers}}
{{ regionserver1:60020 1627500965175}}
{{ requestsPerSecond=1.0, numberOfOnlineRegions=3, snip...}}
{{ regionserver2:60020 1627498037868}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=4, snip...}}
{{ regionserver3:60020 1627498045321}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{ regionserver4:60020 1627498059869}}
{{ requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{0 dead servers}}
{{Aggregate load: 1, regions: 13}}
{{Took 0.0158 seconds}}
> Add dry run mode to hbase balancer
> ----------------------------------
>
> Key: HBASE-26147
> URL: https://issues.apache.org/jira/browse/HBASE-26147
> Project: HBase
> Issue Type: Improvement
> Components: Balancer, master
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Major
>
> It's often rather hard to know how the cost function changes you're making
> will affect the balance of the cluster, and currently the only way to know is
> to run it. If the cost decisions are not good, you may have just moved many
> regions towards a non-ideal balance. Region moves themselves are not free for
> clients, and the resulting balance may cause a regression.
> We should add a mode to the balancer so that it can be invoked without
> actually executing any plans. This will allow an administrator to iterate on
> their cost functions and used the balancer's logging to see how their changes
> would affect the cluster.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)