Jing9 commented on a change in pull request #2483:
URL: https://github.com/apache/hadoop/pull/2483#discussion_r532981333



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
##########
@@ -435,6 +444,22 @@ private long init(List<DatanodeStorageReport> reports) {
     return Math.max(overLoadedBytes, underLoadedBytes);
   }
 
+  private void sortOverUtilizedNodes() {
+    LOG.info("Sorting over-utilized nodes by capacity" +
+        " to bring down top used datanode capacity faster");
+
+    if (overUtilized instanceof List) {
+      List<Source> list = (List<Source>) overUtilized;
+      list.sort(
+          (Source source1, Source source2) ->

Review comment:
       Do we need to consider StorageType (which is associated with Source)? 
E.g., suppose a DN has 2 storage types, one of which is highly utilized and the 
other is just above average. Do we want to first schedule the movement for the 
highly-utilized storage type on this node? 

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
##########
@@ -435,6 +444,22 @@ private long init(List<DatanodeStorageReport> reports) {
     return Math.max(overLoadedBytes, underLoadedBytes);
   }
 
+  private void sortOverUtilizedNodes() {
+    LOG.info("Sorting over-utilized nodes by capacity" +
+        " to bring down top used datanode capacity faster");
+
+    if (overUtilized instanceof List) {

Review comment:
       Do we need this "if" statement? Maybe use a Preconditions instead?

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
##########
@@ -199,7 +199,10 @@
       + "\tWhether to run the balancer during an ongoing HDFS upgrade."
       + "This is usually not desired since it will not affect used space "
       + "on over-utilized machines."
-      + "\n\t[-asService]\tRun as a long running service.";
+      + "\n\t[-asService]\tRun as a long running service."
+      + "\n\t[-sortTopNodes]"
+      + "\tSort over-utilized nodes by capacity to"
+      + " bring down top used datanode faster.";

Review comment:
       How about describe the parameter option as: "sort datanodes based on the 
utilization so that highly utilized datanodes get scheduled first"?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to