[jira] [Updated] (HDDS-5757) balancer should stop when the cluster can not be more balanced

Jackson Yao (Jira) Thu, 16 Sep 2021 23:54:07 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jackson Yao updated HDDS-5757:
------------------------------
    Description: 
when i test container balancer in k8s cluster,  i use the command line :

./ozone admin containerbalancer start -i -1 -t 0.000001 -d 1 -s 500, and set 
the contaienr size to 1G.

i found that balancer thread can not stop when the cluster is close to balance.

in my cluster , i have three datanodes d1,d2,d3(disk usage are 67G, 67G, 68G), 
and the fourth d4 datanode's disk usage is 1G. when i start balancer, it begin 
to balance and work well, many containers have been moved from d1,d2,d3 to d4. 
but when the cluster is close to balance(the disk usages of the four datanodes 
are 50G,52G,50G,51G), the balancer is still running , it will move container 
among those datanodes again and again.

for example，if we have two datanode d1 and  d2,  the disk usage are 3G and 7G 
respectively, then the average usage is 5G. if  we set the threshold to 
0.00001(close to 0), the lowerLimit and upperLimit is close to 5G. if we set 
the container size to 4G,  in the first iteraion, we can recognize the 
over-utilized datanode d2 and the under-utilized datanode d1, then we schedule 
a move option from d2 to d1 with a 4G container. after move option is finished 
, d1 is 7G and d2 is 3G. so the balancer thread will go on for ever.

 so we should let the balancer thread exit when the cluster can not be more 
balanced

 

  was:
when i test container balancer in k8s cluster,  i use the command line :

./ozone admin containerbalancer start -i -1 -t 0.000001 -d 1 -s 500, and set 
the contaienr size to 1G.

i found that balancer thread can not stop when the cluster is close to balance.

suppose we have three datanodes d1,d2,d3(disk usage are 67G, 67G, 68G), and the 
fourth d4 datanode's disk usage is 1G. when i start balancer, it begin to 
balance and work well, many containers have been moved from d1,d2,d3 to d4. but 
when the cluster is close to balance(the disk usages of the four datanodes are 
50G,52G,50G,51G), the balancer is still running , it will move container among 
those datanodes again and again.

for example，if we have two datanode d1 and  d2,  the disk usage are 3G and 7G 
respectively, then the average usage is 5G. if  we set the threshold to 
0.00001(close to 0), the lowerLimit and upperLimit is close to 5G. if we set 
the container size to 4G,  in the first iteraion, we can recognize the 
over-utilized datanode d2 and the under-utilized datanode d1, then we schedule 
a move option from d2 to d1 with a 4G container. after move option is finished 
, d1 is 7G and d2 is 3G. so the balancer thread will go on for ever.

 so we should let the balancer thread exit when the cluster can not be more 
balanced

 


> balancer should stop when the cluster can not be more balanced
> --------------------------------------------------------------
>
>                 Key: HDDS-5757
>                 URL: https://issues.apache.org/jira/browse/HDDS-5757
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Jackson Yao
>            Assignee: Jackson Yao
>            Priority: Major
>
> when i test container balancer in k8s cluster,  i use the command line :
> ./ozone admin containerbalancer start -i -1 -t 0.000001 -d 1 -s 500, and set 
> the contaienr size to 1G.
> i found that balancer thread can not stop when the cluster is close to 
> balance.
> in my cluster , i have three datanodes d1,d2,d3(disk usage are 67G, 67G, 
> 68G), and the fourth d4 datanode's disk usage is 1G. when i start balancer, 
> it begin to balance and work well, many containers have been moved from 
> d1,d2,d3 to d4. but when the cluster is close to balance(the disk usages of 
> the four datanodes are 50G,52G,50G,51G), the balancer is still running , it 
> will move container among those datanodes again and again.
> for example，if we have two datanode d1 and  d2,  the disk usage are 3G and 7G 
> respectively, then the average usage is 5G. if  we set the threshold to 
> 0.00001(close to 0), the lowerLimit and upperLimit is close to 5G. if we set 
> the container size to 4G,  in the first iteraion, we can recognize the 
> over-utilized datanode d2 and the under-utilized datanode d1, then we 
> schedule a move option from d2 to d1 with a 4G container. after move option 
> is finished , d1 is 7G and d2 is 3G. so the balancer thread will go on for 
> ever.
>  so we should let the balancer thread exit when the cluster can not be more 
> balanced
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-5757) balancer should stop when the cluster can not be more balanced

Reply via email to