[jira] [Commented] (HBASE-26147) Add dry run mode to hbase balancer

Bryan Beaudreault (Jira) Wed, 28 Jul 2021 13:18:12 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-26147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389028#comment-17389028
 ]


Bryan Beaudreault commented on HBASE-26147:
-------------------------------------------

Here's an example of running the balancer in dry run mode using the hbase shell.

*{{Initial status -- cluster is well balanced}}*

{{hbase:003:0> status 'simple'}}
{{active master: master1:60000 1627497753352}}
{{1 backup masters}}
{{  master2:60000 1627497808648}}
{{4 live servers}}
{{  regionserver1:60020 1627500965175}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=3}}
{{  regionserver2:60020 1627498037868}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{  regionserver3:60020 1627498045321}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=4, snip...}}
{{  regionserver4:60020 1627498059869}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{0 dead servers}}
{{Aggregate load: 0, regions: 13}}
{{Took 0.0667 seconds}}

*{{Turn off balancer so it doesn't mess with test}}*

{{hbase:004:0> balance_switch 'false'}}
{{Previous balancer state : true}}
{{Took 0.0208 seconds}}
{{=> true}}

*{{Force an imbalance on the cluster}}*

{{hbase:005:0> move '95c3921b4757077f16b1804278222072', 
'regionserver2,60020,1627498037868'}}
{{Took 1.0711 seconds}}
{{hbase:006:0> move '45db520388ef4f427ba4718e3103585e', 
'regionserver2,60020,1627498037868'}}
{{Took 1.0316 seconds}}
{{hbase:007:0> move '75ff1c829866bee6cced10e2b603a1cd', 
'regionserver2,60020,1627498037868'}}
{{Took 1.0398 seconds}}

*{{Check status -- regionserver2 has too many regions}}*

{{hbase:008:0> status 'simple'}}
{{active master: master1:60000 1627497753352}}
{{1 backup masters}}
{{  master2:60000 1627497808648}}
{{4 live servers}}
{{  regionserver1:60020 1627500965175}}
{{    requestsPerSecond=1.0, numberOfOnlineRegions=3, snip...}}
{{  regionserver2:60020 1627498037868}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=6, snip...}}
{{  regionserver3:60020 1627498045321}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{  regionserver4:60020 1627498059869}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=1, snip...}}
{{0 dead servers}}
{{Aggregate load: 1, regions: 13}}
{{Took 0.0110 seconds}}

*{{Run balancer in dry run mode}}*

{{hbase:009:0> dry_run_balancer}}
{{true}}
{{Took 0.1302 seconds}}
{{=> true}}

*{{Check logs on active master, balancer would have moved 2 regions}}*

{{2021-07-28 20:01:27,254 INFO 
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer: Start Generate 
Balance plan for cluster.}}
{{2021-07-28 20:01:27,257 DEBUG 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: 
RegionCountSkewCostFunction sees a total of 4 servers and 13 regions.}}
{{2021-07-28 20:01:27,257 DEBUG 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: We need to load 
balance cluster; total cost=141.75213675213675, sum multiplier=685.0; 
cost/multiplier to need a balance is 0.05}}
{{2021-07-28 20:01:27,258 INFO 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: start 
StochasticLoadBalancer.balancer, initCost=141.75213675213675, 
functionCost=RegionCountSkewCostFunction : (500.0, 0.2222222222222222); 
PrimaryRegionCountSkewCostFunction : (500.0, 0.0); MoveCostFunction :}}
{{ (100.0, 0.0); ServerLocalityCostFunction : (25.0, 0.0); 
TableSkewCostFunction : (35.0, 0.5897435897435898); 
RegionReplicaHostCostFunction : (100000.0, 0.0); RegionReplicaRackCostFunction 
: (10000.0, 0.0); ReadRequestCostFunction : (10.0, 1.0); 
WriteRequestCostFunction : (5.0, 0.0); MemStoreSizeCostFunction : (5.0, 0.0); 
StoreFileCostFunction : (5.0, 0.0); comp}}
{{utedMaxSteps: 41600}}
{{2021-07-28 20:01:27,380 INFO 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished 
computing new load balance plan. Computation took PT0.122S to try 41600 
different iterations. Found a solution that moves 2 regions; Going from a 
computed cost of 141.75213675213675 to a new cost of}}
{{38.84615384615385}}

*{{Check status -- cluster is still imbalanced, dry run mode took no actions}}*

{{hbase:010:0> status 'simple'}}
{{active master: master1:60000 1627497753352}}
{{1 backup masters}}
{{  master2:60000 1627497808648}}
{{4 live servers}}
{{  regionserver1:60020 1627500965175}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{  regionserver2:60020 1627498037868}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=6, snip...}}
{{  regionserver3:60020 1627498045321}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{  regionserver4:60020 1627498059869}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=1, snip...}}
{{0 dead servers}}
{{Aggregate load: 0, regions: 13}}
{{Took 0.0135 seconds}}

*{{Re-enable balancer}}*

{{hbase:011:0> balance_switch 'true'}}
{{Previous balancer state : false}}
{{Took 0.0095 seconds}}
{{=> false}}

*{{Balance the cluster}}*

{{hbase:012:0> balancer}}
{{true}}
{{Took 1.3375 seconds}}
{{=> true}}

*{{Check logs on active master, balancer calculated a similar plan but this 
time moved regions}}*

{{2021-07-28 20:01:53,375 INFO 
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer: Start Generate 
Balance plan for cluster.}}
{{2021-07-28 20:01:53,375 DEBUG 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: 
RegionCountSkewCostFunction sees a total of 4 servers and 13 regions.}}
{{2021-07-28 20:01:53,375 DEBUG 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: We need to load 
balance cluster; total cost=141.75213675213675, sum multiplier=685.0; 
cost/multiplier to need a balance is 0.05}}
{{2021-07-28 20:01:53,376 INFO 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: start 
StochasticLoadBalancer.balancer, initCost=141.75213675213675, 
functionCost=RegionCountSkewCostFunction : (500.0, 0.2222222222222222); 
PrimaryRegionCountSkewCostFunction : (500.0, 0.0); MoveCostFunction :}}
{{ (100.0, 0.0); ServerLocalityCostFunction : (25.0, 0.0); 
TableSkewCostFunction : (35.0, 0.5897435897435898); 
RegionReplicaHostCostFunction : (100000.0, 0.0); RegionReplicaRackCostFunction 
: (10000.0, 0.0); ReadRequestCostFunction : (10.0, 1.0); 
WriteRequestCostFunction : (5.0, 0.0); MemStoreSizeCostFunction : (5.0, 0.0); 
StoreFileCostFunction : (5.0, 0.0); comp}}
{{utedMaxSteps: 41600}}
{{2021-07-28 20:01:53,499 INFO 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished 
computing new load balance plan. Computation took PT0.124S to try 41600 
different iterations. Found a solution that moves 2 regions; Going from a 
computed cost of 141.75213675213675 to a new cost of}}
{{38.84615384615385}}
{{2021-07-28 20:01:53,499 INFO org.apache.hadoop.hbase.master.HMaster: Balancer 
plans size is 2, the balance interval is 90000 ms, and the max number regions 
in transition is 13}}
{{2021-07-28 20:01:53,499 INFO org.apache.hadoop.hbase.master.HMaster: balance 
hri=740b4fdcbc4c68c37c80b98e10be5726, source=regionserver2,60020,1627498037868, 
destination=regionserver4,60020,1627498059869}}

{{... snip remaining log spam ...}}

*{{Final status -- cluster is re-balanced}}*

{{hbase:013:0> status 'simple'}}
{{active master: master1:60000 1627497753352}}
{{1 backup masters}}
{{  master2:60000 1627497808648}}
{{4 live servers}}
{{  regionserver1:60020 1627500965175}}
{{    requestsPerSecond=1.0, numberOfOnlineRegions=3, snip...}}
{{  regionserver2:60020 1627498037868}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=4, snip...}}
{{  regionserver3:60020 1627498045321}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{  regionserver4:60020 1627498059869}}
{{    requestsPerSecond=0.0, numberOfOnlineRegions=3, snip...}}
{{0 dead servers}}
{{Aggregate load: 1, regions: 13}}
{{Took 0.0158 seconds}}

> Add dry run mode to hbase balancer
> ----------------------------------
>
>                 Key: HBASE-26147
>                 URL: https://issues.apache.org/jira/browse/HBASE-26147
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer, master
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>
> It's often rather hard to know how the cost function changes you're making 
> will affect the balance of the cluster, and currently the only way to know is 
> to run it. If the cost decisions are not good, you may have just moved many 
> regions towards a non-ideal balance. Region moves themselves are not free for 
> clients, and the resulting balance may cause a regression.
> We should add a mode to the balancer so that it can be invoked without 
> actually executing any plans. This will allow an administrator to iterate on 
> their cost functions and used the balancer's logging to see how their changes 
> would affect the cluster. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-26147) Add dry run mode to hbase balancer

Reply via email to