[jira] [Work logged] (HDFS-16521) DFS API to retrieve slow datanodes

ASF GitHub Bot (Jira) Tue, 26 Apr 2022 22:33:05 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16521?focusedWorklogId=762716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762716
 ]


ASF GitHub Bot logged work on HDFS-16521:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Apr/22 05:32
            Start Date: 27/Apr/22 05:32
    Worklog Time Spent: 10m 
      Work Description: jojochuang commented on code in PR #4107:
URL: https://github.com/apache/hadoop/pull/4107#discussion_r859381403


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##########
@@ -4914,6 +4914,33 @@ int getNumberOfDatanodes(DatanodeReportType type) {
     }
   }
 
+  DatanodeInfo[] slowDataNodesReport() throws IOException {
+    String operationName = "slowDataNodesReport";
+    DatanodeInfo[] datanodeInfos;
+    checkSuperuserPrivilege(operationName);

Review Comment:
   does it need to require super user privilege?



##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java:
##########
@@ -433,7 +433,7 @@ static int run(DistributedFileSystem dfs, String[] argv, 
int idx) throws IOExcep
    */
   private static final String commonUsageSummary =
     "\t[-report [-live] [-dead] [-decommissioning] " +
-    "[-enteringmaintenance] [-inmaintenance]]\n" +
+      "[-enteringmaintenance] [-inmaintenance] [-slownodes]]\n" +

Review Comment:
   The corresponding documentation needs to update when CLI commands are 
added/updated.



##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java:
##########
@@ -632,6 +638,20 @@ private static void 
printDataNodeReports(DistributedFileSystem dfs,
     }
   }
 
+  private static void printSlowDataNodeReports(DistributedFileSystem dfs, 
boolean listNodes,

Review Comment:
   Can you provide a sample output? It would be confusing, I guess. I suspect 
you would need some kind of header to distinguish from the other data node 
reports. 



##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java:
##########
@@ -1868,4 +1868,16 @@ BatchedEntries<OpenFileEntry> listOpenFiles(long prevId,
    */
   @AtMostOnce
   void satisfyStoragePolicy(String path) throws IOException;
+
+  /**
+   * Get report on all of the slow Datanodes. Slow running datanodes are 
identified based on
+   * the Outlier detection algorithm, if slow peer tracking is enabled for the 
DFS cluster.
+   *
+   * @return Datanode report for slow running datanodes.
+   * @throws IOException If an I/O error occurs.
+   */
+  @Idempotent
+  @ReadOnly
+  DatanodeInfo[] getSlowDatanodeReport() throws IOException;

Review Comment:
   I just want to check with every one that it is okay to have an array of 
objects as the return value.
   I think it's fine but just want to check with every one, because once we 
decide the the interface it can't be changed later.



##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java:
##########
@@ -433,7 +433,7 @@ static int run(DistributedFileSystem dfs, String[] argv, 
int idx) throws IOExcep
    */
   private static final String commonUsageSummary =
     "\t[-report [-live] [-dead] [-decommissioning] " +
-    "[-enteringmaintenance] [-inmaintenance]]\n" +
+      "[-enteringmaintenance] [-inmaintenance] [-slownodes]]\n" +

Review Comment:
   In fact it would appear confusion to HDFS administrators. These subcommands 
are meant to filter the DNs in these states, and "slownodes" is not a defined 
DataNode state.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 762716)
    Time Spent: 3.5h  (was: 3h 20m)

> DFS API to retrieve slow datanodes
> ----------------------------------
>
>                 Key: HDFS-16521
>                 URL: https://issues.apache.org/jira/browse/HDFS-16521
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Providing DFS API to retrieve slow nodes would help add an additional option 
> to "dfsadmin -report" that lists slow datanodes info for operators to take a 
> look, specifically useful filter for larger clusters.
> The other purpose of such API is for HDFS downstreamers without direct access 
> to namenode http port (only rpc port accessible) to retrieve slownodes.
> Moreover, 
> [FanOutOneBlockAsyncDFSOutput|https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutput.java]
>  in HBase currently has to rely on it's own way of marking and excluding slow 
> nodes while 1) creating pipelines and 2) handling ack, based on factors like 
> the data length of the packet, processing time with last ack timestamp, 
> whether flush to replicas is finished etc. If it can utilize slownode API 
> from HDFS to exclude nodes appropriately while writing block, a lot of it's 
> own post-ack computation of slow nodes can be _saved_ or _improved_ or based 
> on further experiment, we could find _better solution_ to manage slow node 
> detection logic both in HDFS and HBase. However, in order to collect more 
> data points and run more POC around this area, HDFS should provide API for 
> downstreamers to efficiently utilize slownode info for such critical 
> low-latency use-case (like writing WALs).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-16521) DFS API to retrieve slow datanodes

Reply via email to