[
https://issues.apache.org/jira/browse/HDFS-16521?focusedWorklogId=762734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762734
]
ASF GitHub Bot logged work on HDFS-16521:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 27/Apr/22 06:42
Start Date: 27/Apr/22 06:42
Worklog Time Spent: 10m
Work Description: virajjasani commented on code in PR #4107:
URL: https://github.com/apache/hadoop/pull/4107#discussion_r859427464
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java:
##########
@@ -433,7 +433,7 @@ static int run(DistributedFileSystem dfs, String[] argv,
int idx) throws IOExcep
*/
private static final String commonUsageSummary =
"\t[-report [-live] [-dead] [-decommissioning] " +
- "[-enteringmaintenance] [-inmaintenance]]\n" +
+ "[-enteringmaintenance] [-inmaintenance] [-slownodes]]\n" +
Review Comment:
Reg the command options, I believe filters can be ideally used for both: 1)
state of DNs (decommissioning, dead, live etc) and 2) nature of DNs (slow
outliers). Updated the doc, please review.
Thanks
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java:
##########
@@ -632,6 +638,20 @@ private static void
printDataNodeReports(DistributedFileSystem dfs,
}
}
+ private static void printSlowDataNodeReports(DistributedFileSystem dfs,
boolean listNodes,
Review Comment:
> I suspect you would need some kind of header to distinguish from the other
data node reports.
This is called only if condition `listAll || listSlowNodes` is true:
```
if (listAll || listSlowNodes) {
printSlowDataNodeReports(dfs, listSlowNodes, "Slow");
}
```
Sample output:
<img width="524" alt="Screenshot 2022-03-25 at 9 12 58 PM"
src="https://user-images.githubusercontent.com/34790606/165455352-303eb506-0a5f-491d-ac44-bcc243a8f0f6.png">
##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java:
##########
@@ -1868,4 +1868,16 @@ BatchedEntries<OpenFileEntry> listOpenFiles(long prevId,
*/
@AtMostOnce
void satisfyStoragePolicy(String path) throws IOException;
+
+ /**
+ * Get report on all of the slow Datanodes. Slow running datanodes are
identified based on
+ * the Outlier detection algorithm, if slow peer tracking is enabled for the
DFS cluster.
+ *
+ * @return Datanode report for slow running datanodes.
+ * @throws IOException If an I/O error occurs.
+ */
+ @Idempotent
+ @ReadOnly
+ DatanodeInfo[] getSlowDatanodeReport() throws IOException;
Review Comment:
I thought List is also fine but kept it Array to keep the API contract in
line with `getDatanodeReport()` so that both APIs can use same underlying
utility methods (e.g. getDatanodeInfoFromDescriptors() ).
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##########
@@ -4914,6 +4914,33 @@ int getNumberOfDatanodes(DatanodeReportType type) {
}
}
+ DatanodeInfo[] slowDataNodesReport() throws IOException {
+ String operationName = "slowDataNodesReport";
+ DatanodeInfo[] datanodeInfos;
+ checkSuperuserPrivilege(operationName);
Review Comment:
Not really, removed, thanks.
Issue Time Tracking
-------------------
Worklog Id: (was: 762734)
Time Spent: 3h 40m (was: 3.5h)
> DFS API to retrieve slow datanodes
> ----------------------------------
>
> Key: HDFS-16521
> URL: https://issues.apache.org/jira/browse/HDFS-16521
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> Providing DFS API to retrieve slow nodes would help add an additional option
> to "dfsadmin -report" that lists slow datanodes info for operators to take a
> look, specifically useful filter for larger clusters.
> The other purpose of such API is for HDFS downstreamers without direct access
> to namenode http port (only rpc port accessible) to retrieve slownodes.
> Moreover,
> [FanOutOneBlockAsyncDFSOutput|https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutput.java]
> in HBase currently has to rely on it's own way of marking and excluding slow
> nodes while 1) creating pipelines and 2) handling ack, based on factors like
> the data length of the packet, processing time with last ack timestamp,
> whether flush to replicas is finished etc. If it can utilize slownode API
> from HDFS to exclude nodes appropriately while writing block, a lot of it's
> own post-ack computation of slow nodes can be _saved_ or _improved_ or based
> on further experiment, we could find _better solution_ to manage slow node
> detection logic both in HDFS and HBase. However, in order to collect more
> data points and run more POC around this area, HDFS should provide API for
> downstreamers to efficiently utilize slownode info for such critical
> low-latency use-case (like writing WALs).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]