[jira] [Comment Edited] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster

Xu Cang (JIRA) Fri, 28 Jun 2019 16:07:03 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875277#comment-16875277
 ]


Xu Cang edited comment on HBASE-22349 at 6/28/19 11:06 PM:
-----------------------------------------------------------

This is a very good observation. One of my co-worker observed and debugged the 
similar issue in our environment.

Obviously we don't want RS holds 0 regions and LB still think it is 'balanced'. 
Besides tweaking 'minCostNeedBalance', maybe we can introduce a rule that when 
RS holds 0 region, it sill trigger balancing regardless. 

 Or, we can adjust cost() for this class :

static class PrimaryRegionCountSkewCostFunction

to make this factor impacting more than others?


was (Author: xucang):
This is a very good observation. One of my co-worker observed and debugged the 
similar issue in our environment.

Obviously we don't want RS holds 0 regions and LB still think it is 'balanced'. 
Besides tweaking 'minCostNeedBalance', maybe we can introduce a rule that when 
RS holds 0 region, it sill trigger balancing regardless. 

 

> Stochastic Load Balancer skips balancing when node is replaced in cluster
> -------------------------------------------------------------------------
>
>                 Key: HBASE-22349
>                 URL: https://issues.apache.org/jira/browse/HBASE-22349
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.4.4
>            Reporter: Suthan Phillips
>            Priority: Major
>         Attachments: Hbase-22349.pdf
>
>
> In EMR cluster, whenever I replace one of the nodes, the regions never get 
> rebalanced.
> The default minCostNeedBalance set to 0.05 is too high.
> The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 
> = 203
> Once a node(region server) got replaced with a new node (terminated and EMR 
> recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 
> 22, 22, 23, 23, 23 = 203
> From hbase-master-logs, I can see the below WARN which indicates that the 
> default minCostNeedBalance does not hold good for these scenarios.
> ##
> 2019-04-29 09:31:37,027 WARN  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> cleaner.CleanerChore: WALs outstanding under 
> hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 
> 09:31:42,920 INFO  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost 
> which need balance is 0.05
> ##
> To mitigate this, I had to modify the default minCostNeedBalance to lower 
> value like 0.01f and restart Region Servers and Hbase Master. After modifying 
> this value to 0.01f I could see the regions getting re-balanced.
> This has led me to the following questions which I would like to get it 
> answered from the HBase experts.
> 1)What are the factors that affect the value of total cost and sum 
> multiplier? How could we determine the right minCostNeedBalance value for any 
> cluster?
> 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal 
> value? If yes, then what is the recommended way to mitigate this scenario? 
> Attached: Steps to reproduce
>  
> Note: HBase-17565 patch is already applied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster

Reply via email to