[jira] [Work logged] (HDFS-16203) Discover datanodes with unbalanced block pool usage by the standard deviation

ASF GitHub Bot (Jira) Sun, 12 Sep 2021 06:49:06 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16203?focusedWorklogId=649706&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-649706
 ]


ASF GitHub Bot logged work on HDFS-16203:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Sep/21 13:48
            Start Date: 12/Sep/21 13:48
    Worklog Time Spent: 10m 
      Work Description: tomscut commented on a change in pull request #3366:
URL: https://github.com/apache/hadoop/pull/3366#discussion_r706840884



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
##########
@@ -6537,14 +6537,45 @@ public String getLiveNodes() {
       if (node.getUpgradeDomain() != null) {
         innerinfo.put("upgradeDomain", node.getUpgradeDomain());
       }
+      StorageReport[] storageReports = node.getStorageReports();
+      innerinfo.put("blockPoolUsedPercentStdDev",
+          getBlockPoolUsedPercentStdDev(storageReports));
       info.put(node.getXferAddrWithHostname(), innerinfo.build());
     }
     return JSON.toString(info);
   }
 
+  /**
+   * Return the standard deviation of storage block pool usage.
+   */
+  @VisibleForTesting
+  public float getBlockPoolUsedPercentStdDev(StorageReport[] storageReports) {
+    ArrayList<Float> usagePercentList = new ArrayList<>();
+    float totalUsagePercent = 0.0f;
+    float dev = 0.0f;
+
+    if (storageReports.length == 0) {
+      return dev;
+    }
+
+    for (StorageReport s : storageReports) {
+      usagePercentList.add(s.getBlockPoolUsagePercent());
+      totalUsagePercent += s.getBlockPoolUsagePercent();
+    }
+
+    totalUsagePercent /= storageReports.length;
+    Collections.sort(usagePercentList);

Review comment:
       @ferhui A float or double may lose precision when being evaluated. When 
multiple values are operated on in different order, the results may be 
inconsistent. After remote ```Collections.sort(usagePercentList);```, I only 
take two decimal points to assert. Please take a look at this. Thank you very 
much.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 649706)
    Time Spent: 3h  (was: 2h 50m)

> Discover datanodes with unbalanced block pool usage by the standard deviation
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-16203
>                 URL: https://issues.apache.org/jira/browse/HDFS-16203
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: tomscut
>            Assignee: tomscut
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2021-09-01-19-16-27-172.png
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> *Discover datanodes with unbalanced volume usage by the standard deviation.*
> *In some scenarios, we may cause unbalanced datanode disk usage:*
>  1. Repair the damaged disk and make it online again.
>  2. Add disks to some Datanodes.
>  3. Some disks are damaged, resulting in slow data writing.
>  4. Use some custom volume choosing policies.
> In the case of unbalanced disk usage, a sudden increase in datanode write 
> traffic may result in busy disk I/O with low volume usage, resulting in 
> decreased throughput across datanodes.
> We need to find these nodes in time to do diskBalance, or other processing. 
> Based on the volume usage of each datanode, we can calculate the standard 
> deviation of the volume usage. The more unbalanced the volume, the higher the 
> standard deviation.
> *We can display the result on the Web of namenode, and then sorting directly 
> to find the nodes where the volumes usages are unbalanced.*
> *{color:#172b4d}This interface is only used to obtain metrics and does not 
> adversely affect namenode performance.{color}*
>  
> {color:#172b4d}!image-2021-09-01-19-16-27-172.png|width=581,height=216!{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-16203) Discover datanodes with unbalanced block pool usage by the standard deviation

Reply via email to