[jira] [Work logged] (HDFS-16521) DFS API to retrieve slow datanodes

ASF GitHub Bot (Jira) Wed, 30 Mar 2022 04:47:05 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16521?focusedWorklogId=750038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-750038
 ]


ASF GitHub Bot logged work on HDFS-16521:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Mar/22 11:46
            Start Date: 30/Mar/22 11:46
    Worklog Time Spent: 10m 
      Work Description: virajjasani commented on pull request #4107:
URL: https://github.com/apache/hadoop/pull/4107#issuecomment-1083038635


   @iwasakims @ayushtkn I was earlier thinking about adding SLOW_NODE in 
`DatanodeReportType` so that ClientProtocol#getDatanodeReport can take care of 
retrieval of slownodes but the server side implementation seems to be getting 
bit more complicated with it and hence to make this a separate and clean 
workflow, I thought of adding it as new API in ClientProtocol. But other than 
that, this is quite similar to getDatanodeReport() API only.
   
   When HDFS throughput is affected, it would be really great for operators to 
check for slownode details (similar command to retrieve decommission, dead, 
live nodes) using `dfsadmin -report` command.
   
   > How about enhancing metrics if the current information in the 
SlowPeersReport is insufficient?
   
   We can do this but I believe if we can add more info to slownode only when 
required i.e. by user triggered API (similar to ClientProtocol), that would be 
less overhead than continuously exposing additional details in the metrics. 
WDYT?
   
   
   > Thanks to 
[JMXJsonServlet](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/jmx/JMXJsonServlet.java),
 we can get metrics in JSON format via HTTP/HTTPS port of NameNode without 
additional configuration.
   
   Yes this is helpful for sure but only if Namenode port is exposed to 
downstream application.
   For instance, in K8S cluster, namenode port access might be restricted to 
only namenode and datanode pods/containers, so other service pods (e.g. hbase 
service pods/containers) would not even have access to namenode port and hence 
no way for it to derive metric values. Metric exposure is definitely good for 
the end customers to get a high level view, I agree with it. But applications 
on the other hand, depending on the environment, might or might not even have 
access to values derived from metrics.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 750038)
    Time Spent: 1h 20m  (was: 1h 10m)

> DFS API to retrieve slow datanodes
> ----------------------------------
>
>                 Key: HDFS-16521
>                 URL: https://issues.apache.org/jira/browse/HDFS-16521
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In order to build some automation around slow datanodes that regularly show 
> up in the slow peer tracking report, e.g. decommission such nodes and queue 
> them up for external processing and add them back later to the cluster after 
> fixing issues etc, we should expose DFS API to retrieve all slow nodes at a 
> given time.
> Providing such API would also help add an additional option to "dfsadmin 
> -report" that lists slow datanodes info for operators to take a look, 
> specifically useful filter for larger clusters.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-16521) DFS API to retrieve slow datanodes

Reply via email to