[jira] [Work logged] (HDFS-14904) Option to let Balancer prefer top used nodes in each iteration

ASF GitHub Bot (Jira) Tue, 01 Dec 2020 11:02:05 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-14904?focusedWorklogId=518601&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518601
 ]


ASF GitHub Bot logged work on HDFS-14904:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Dec/20 19:01
            Start Date: 01/Dec/20 19:01
    Worklog Time Spent: 10m 
      Work Description: LeonGao91 commented on a change in pull request #2483:
URL: https://github.com/apache/hadoop/pull/2483#discussion_r533650857



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
##########
@@ -435,6 +444,22 @@ private long init(List<DatanodeStorageReport> reports) {
     return Math.max(overLoadedBytes, underLoadedBytes);
   }
 
+  private void sortOverUtilizedNodes() {
+    LOG.info("Sorting over-utilized nodes by capacity" +
+        " to bring down top used datanode capacity faster");
+
+    if (overUtilized instanceof List) {
+      List<Source> list = (List<Source>) overUtilized;
+      list.sort(
+          (Source source1, Source source2) ->

Review comment:
       Good idea, we should use utilization for storage type instead of 
datanode utilization. Will fix




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 518601)
    Time Spent: 1h 10m  (was: 1h)

> Option to let Balancer prefer top used nodes in each iteration
> --------------------------------------------------------------
>
>                 Key: HDFS-14904
>                 URL: https://issues.apache.org/jira/browse/HDFS-14904
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer &amp; mover
>            Reporter: Leon Gao
>            Assignee: Leon Gao
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Normally the most important purpose for HDFS balancer is to reduce the top 
> used node to prevent datanode usage from being too high.
> Currently, balancer almost randomly picks nodes as sources regardless of 
> usage, which makes it slow to bring down the top used datanodes in the 
> cluster, when there are less underutilized nodes in the cluster (consider 
> expansion).
> We can add an option to prefer top used nodes first in each iteration, as 
> suggested in HDFS-14894 .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-14904) Option to let Balancer prefer top used nodes in each iteration

Reply via email to