[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

ASF GitHub Bot (Jira) Thu, 21 Jul 2022 10:32:07 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=793863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793863
 ]


ASF GitHub Bot logged work on HDFS-16678:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Jul/22 17:31
            Start Date: 21/Jul/22 17:31
    Worklog Time Spent: 10m 
      Work Description: goiri commented on code in PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#discussion_r926936268


##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##########
@@ -544,28 +548,30 @@ public String getNodeUsage() {
 
     final Map<String, Map<String, Object>> info = new HashMap<>();
     try {
-      RouterRpcServer rpcServer = this.router.getRpcServer();
-      DatanodeInfo[] live = rpcServer.getDatanodeReport(
-          DatanodeReportType.LIVE, false, timeOut);
-
-      if (live.length > 0) {
-        float totalDfsUsed = 0;
-        float[] usages = new float[live.length];
-        int i = 0;
-        for (DatanodeInfo dn : live) {
-          usages[i++] = dn.getDfsUsedPercent();
-          totalDfsUsed += dn.getDfsUsedPercent();
-        }
-        totalDfsUsed /= live.length;
-        Arrays.sort(usages);
-        median = usages[usages.length / 2];
-        max = usages[usages.length - 1];
-        min = usages[0];
-
-        for (i = 0; i < usages.length; i++) {
-          dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+      if (this.enableGetDNUsage) {

Review Comment:
   I would do:
   ```
   DatanodeInfo[] live = null;
   if (this.enableGetDNUsage) {
     RouterRpcServer rpcServer = this.router.getRpcServer();
     DatanodeInfo[] live = rpcServer.getDatanodeReport(DatanodeReportType.LIVE, 
false, timeOut);
   } else {
     LOG.debug("Getting information is disabled."); // similar message
   }
   if (live != null && live.length > 0) {
   ```



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##########
@@ -544,28 +548,30 @@ public String getNodeUsage() {
 
     final Map<String, Map<String, Object>> info = new HashMap<>();
     try {
-      RouterRpcServer rpcServer = this.router.getRpcServer();
-      DatanodeInfo[] live = rpcServer.getDatanodeReport(
-          DatanodeReportType.LIVE, false, timeOut);
-
-      if (live.length > 0) {
-        float totalDfsUsed = 0;
-        float[] usages = new float[live.length];
-        int i = 0;
-        for (DatanodeInfo dn : live) {
-          usages[i++] = dn.getDfsUsedPercent();
-          totalDfsUsed += dn.getDfsUsedPercent();
-        }
-        totalDfsUsed /= live.length;
-        Arrays.sort(usages);
-        median = usages[usages.length / 2];
-        max = usages[usages.length - 1];
-        min = usages[0];
-
-        for (i = 0; i < usages.length; i++) {
-          dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+      if (this.enableGetDNUsage) {
+        RouterRpcServer rpcServer = this.router.getRpcServer();
+        DatanodeInfo[] live = rpcServer.getDatanodeReport(
+            DatanodeReportType.LIVE, false, timeOut);
+
+        if (live.length > 0) {
+          float totalDfsUsed = 0;
+          float[] usages = new float[live.length];
+          int i = 0;
+          for (DatanodeInfo dn : live) {
+            usages[i++] = dn.getDfsUsedPercent();

Review Comment:
   What is the expensive part of this whole block? this?



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##########
@@ -544,28 +548,30 @@ public String getNodeUsage() {
 
     final Map<String, Map<String, Object>> info = new HashMap<>();
     try {
-      RouterRpcServer rpcServer = this.router.getRpcServer();
-      DatanodeInfo[] live = rpcServer.getDatanodeReport(
-          DatanodeReportType.LIVE, false, timeOut);
-
-      if (live.length > 0) {
-        float totalDfsUsed = 0;
-        float[] usages = new float[live.length];
-        int i = 0;
-        for (DatanodeInfo dn : live) {
-          usages[i++] = dn.getDfsUsedPercent();
-          totalDfsUsed += dn.getDfsUsedPercent();
-        }
-        totalDfsUsed /= live.length;
-        Arrays.sort(usages);
-        median = usages[usages.length / 2];
-        max = usages[usages.length - 1];
-        min = usages[0];
-
-        for (i = 0; i < usages.length; i++) {
-          dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+      if (this.enableGetDNUsage) {
+        RouterRpcServer rpcServer = this.router.getRpcServer();
+        DatanodeInfo[] live = rpcServer.getDatanodeReport(
+            DatanodeReportType.LIVE, false, timeOut);
+
+        if (live.length > 0) {
+          float totalDfsUsed = 0;
+          float[] usages = new float[live.length];
+          int i = 0;
+          for (DatanodeInfo dn : live) {
+            usages[i++] = dn.getDfsUsedPercent();
+            totalDfsUsed += dn.getDfsUsedPercent();
+          }
+          totalDfsUsed /= live.length;
+          Arrays.sort(usages);
+          median = usages[usages.length / 2];
+          max = usages[usages.length - 1];
+          min = usages[0];
+
+          for (i = 0; i < usages.length; i++) {
+            dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+          }
+          dev = (float) Math.sqrt(dev / usages.length);

Review Comment:
   Apache commons math has a nice StandardDeviation utility.
   It might be good to use this directly.
   
https://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/stat/descriptive/moment/StandardDeviation.html



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##########
@@ -544,28 +548,30 @@ public String getNodeUsage() {
 
     final Map<String, Map<String, Object>> info = new HashMap<>();
     try {
-      RouterRpcServer rpcServer = this.router.getRpcServer();
-      DatanodeInfo[] live = rpcServer.getDatanodeReport(
-          DatanodeReportType.LIVE, false, timeOut);
-
-      if (live.length > 0) {
-        float totalDfsUsed = 0;
-        float[] usages = new float[live.length];
-        int i = 0;
-        for (DatanodeInfo dn : live) {
-          usages[i++] = dn.getDfsUsedPercent();
-          totalDfsUsed += dn.getDfsUsedPercent();
-        }
-        totalDfsUsed /= live.length;
-        Arrays.sort(usages);
-        median = usages[usages.length / 2];
-        max = usages[usages.length - 1];
-        min = usages[0];
-
-        for (i = 0; i < usages.length; i++) {
-          dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+      if (this.enableGetDNUsage) {
+        RouterRpcServer rpcServer = this.router.getRpcServer();
+        DatanodeInfo[] live = rpcServer.getDatanodeReport(
+            DatanodeReportType.LIVE, false, timeOut);
+
+        if (live.length > 0) {
+          float totalDfsUsed = 0;
+          float[] usages = new float[live.length];
+          int i = 0;
+          for (DatanodeInfo dn : live) {
+            usages[i++] = dn.getDfsUsedPercent();

Review Comment:
   I guess is rpcServer.getDatanodeReport() actually.



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RBFConfigKeys.java:
##########
@@ -315,6 +315,9 @@ public class RBFConfigKeys extends 
CommonConfigurationKeysPublic {
       FEDERATION_ROUTER_PREFIX + "dn-report.cache-expire";
   public static final long DN_REPORT_CACHE_EXPIRE_MS_DEFAULT =
       TimeUnit.SECONDS.toMillis(10);
+  public static final String DFS_ROUTER_ENABLE_GET_DN_USAGE_KEY =
+      FEDERATION_ROUTER_PREFIX + "enable.get.dn.usage";
+  public static final boolean DFS_ROUTER_ENABLE_GET_DN_USAGE_DEFAULT = true;

Review Comment:
   Can we add a test?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 793863)
    Time Spent: 0.5h  (was: 20m)

> RBF supports disable getNodeUsage() in RBFMetrics
> -------------------------------------------------
>
>                 Key: HDFS-16678
>                 URL: https://issues.apache.org/jira/browse/HDFS-16678
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In our prod environment, we try to collect RBF metrics every 15s through 
> jmx_exporter. And we found that collection task often failed. 
> After tracing and found that the collection task is blocked at getNodeUsage() 
> in RBFMetrics, because it will collection all datanode's usage from 
> downstream nameservices.  This is a very expensive and almost useless 
> operation. Because in most scenarios, each NameSerivce contains almost the 
> same DNs. We can get the data usage's from any one nameservices, not from RBF.
> So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

Reply via email to