[
https://issues.apache.org/jira/browse/HDFS-16521?focusedWorklogId=750038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-750038
]
ASF GitHub Bot logged work on HDFS-16521:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 30/Mar/22 11:46
Start Date: 30/Mar/22 11:46
Worklog Time Spent: 10m
Work Description: virajjasani commented on pull request #4107:
URL: https://github.com/apache/hadoop/pull/4107#issuecomment-1083038635
@iwasakims @ayushtkn I was earlier thinking about adding SLOW_NODE in
`DatanodeReportType` so that ClientProtocol#getDatanodeReport can take care of
retrieval of slownodes but the server side implementation seems to be getting
bit more complicated with it and hence to make this a separate and clean
workflow, I thought of adding it as new API in ClientProtocol. But other than
that, this is quite similar to getDatanodeReport() API only.
When HDFS throughput is affected, it would be really great for operators to
check for slownode details (similar command to retrieve decommission, dead,
live nodes) using `dfsadmin -report` command.
> How about enhancing metrics if the current information in the
SlowPeersReport is insufficient?
We can do this but I believe if we can add more info to slownode only when
required i.e. by user triggered API (similar to ClientProtocol), that would be
less overhead than continuously exposing additional details in the metrics.
WDYT?
> Thanks to
[JMXJsonServlet](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/jmx/JMXJsonServlet.java),
we can get metrics in JSON format via HTTP/HTTPS port of NameNode without
additional configuration.
Yes this is helpful for sure but only if Namenode port is exposed to
downstream application.
For instance, in K8S cluster, namenode port access might be restricted to
only namenode and datanode pods/containers, so other service pods (e.g. hbase
service pods/containers) would not even have access to namenode port and hence
no way for it to derive metric values. Metric exposure is definitely good for
the end customers to get a high level view, I agree with it. But applications
on the other hand, depending on the environment, might or might not even have
access to values derived from metrics.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 750038)
Time Spent: 1h 20m (was: 1h 10m)
> DFS API to retrieve slow datanodes
> ----------------------------------
>
> Key: HDFS-16521
> URL: https://issues.apache.org/jira/browse/HDFS-16521
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> In order to build some automation around slow datanodes that regularly show
> up in the slow peer tracking report, e.g. decommission such nodes and queue
> them up for external processing and add them back later to the cluster after
> fixing issues etc, we should expose DFS API to retrieve all slow nodes at a
> given time.
> Providing such API would also help add an additional option to "dfsadmin
> -report" that lists slow datanodes info for operators to take a look,
> specifically useful filter for larger clusters.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]