[jira] [Updated] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster

Suthan Phillips (JIRA) Wed, 01 May 2019 23:26:48 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Suthan Phillips updated HBASE-22349:
------------------------------------
    Affects Version/s: 1.4.4
           Attachment: Hbase-22349.pdf
          Description: 
In EMR cluster, whenever I replace one of the nodes, the regions never get 
rebalanced.

The default minCostNeedBalance set to 0.05 is too high.

The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 = 
203

Once a node(region server) got replaced with a new node (terminated and EMR 
recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 
22, 22, 23, 23, 23 = 203

>From hbase-master-logs, I can see the below WARN which indicates that the 
>default minCostNeedBalance does not hold good for these scenarios.

##

2019-04-29 09:31:37,027 WARN  
[ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
cleaner.CleanerChore: WALs outstanding under 
hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 
09:31:42,920 INFO  
[ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost 
which need balance is 0.05

##

To mitigate this, I had to modify the default minCostNeedBalance to lower value 
like 0.01f and restart Region Servers and Hbase Master. After modifying this 
value to 0.01f I could see the regions getting re-balanced.

This has led me to the following questions which I would like to get it 
answered from the HBase experts.

1)What are the factors that affect the value of total cost and sum multiplier? 
How could we determine the right minCostNeedBalance value for any cluster?

2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal 
value? If yes, then what is the recommended way to mitigate this scenario? 

Attached: Steps to reproduce

 

Note: HBase-17565 patch is already applied.
              Summary: Stochastic Load Balancer skips balancing when node is 
replaced in cluster  (was: eifjccgngfnjugrvnklblbflhjfehbbckhcktubbnvur)

> Stochastic Load Balancer skips balancing when node is replaced in cluster
> -------------------------------------------------------------------------
>
>                 Key: HBASE-22349
>                 URL: https://issues.apache.org/jira/browse/HBASE-22349
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.4.4
>            Reporter: Suthan Phillips
>            Priority: Major
>         Attachments: Hbase-22349.pdf
>
>
> In EMR cluster, whenever I replace one of the nodes, the regions never get 
> rebalanced.
> The default minCostNeedBalance set to 0.05 is too high.
> The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 
> = 203
> Once a node(region server) got replaced with a new node (terminated and EMR 
> recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 
> 22, 22, 23, 23, 23 = 203
> From hbase-master-logs, I can see the below WARN which indicates that the 
> default minCostNeedBalance does not hold good for these scenarios.
> ##
> 2019-04-29 09:31:37,027 WARN  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> cleaner.CleanerChore: WALs outstanding under 
> hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 
> 09:31:42,920 INFO  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost 
> which need balance is 0.05
> ##
> To mitigate this, I had to modify the default minCostNeedBalance to lower 
> value like 0.01f and restart Region Servers and Hbase Master. After modifying 
> this value to 0.01f I could see the regions getting re-balanced.
> This has led me to the following questions which I would like to get it 
> answered from the HBase experts.
> 1)What are the factors that affect the value of total cost and sum 
> multiplier? How could we determine the right minCostNeedBalance value for any 
> cluster?
> 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal 
> value? If yes, then what is the recommended way to mitigate this scenario? 
> Attached: Steps to reproduce
>  
> Note: HBase-17565 patch is already applied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster

Reply via email to