[ 
https://issues.apache.org/jira/browse/HDFS-16521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HDFS-16521:
--------------------------------
    Description: 
Providing DFS API to retrieve slow nodes would help add an additional option to 
"dfsadmin -report" that lists slow datanodes info for operators to take a look, 
specifically useful filter for larger clusters.

The other purpose of such API is for HDFS downstreamers without direct access 
to namenode http port (only rpc port accessible) to retrieve slownodes.

Moreover, 
[FanOutOneBlockAsyncDFSOutput|https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutput.java]
 in HBase currently has to rely on it's own way of marking and excluding slow 
nodes while 1) creating pipelines and 2) handling ack, based on factors like 
the data length of the packet, processing time with last ack timestamp, whether 
flush to replicas is finished etc. If it can utilize slownode API from HDFS to 
exclude nodes appropriately while writing block, a lot of it's own post-ack 
computation of slow nodes can be _saved_ or _improved_ or based on further 
experiment, we could find _better solution_ to manage slow node detection logic 
both in HDFS and HBase. However, in order to collect more data points and run 
more POC around this area, HDFS should provide API for downstreamers to 
efficiently utilize slownode info for such critical low-latency use-case (like 
writing WALs).

  was:
Providing DFS API to retrieve slow nodes would help add an additional option to 
"dfsadmin -report" that lists slow datanodes info for operators to take a look, 
specifically useful filter for larger clusters.

The other purpose of such API is for HDFS downstreamers without direct access 
to namenode http port (only rpc port accessible) to retrieve slownodes.

Moreover, 
[FanOutOneBlockAsyncDFSOutput|https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutput.java]
 in HBase currently has to rely on it's own way of marking and excluding slow 
nodes while 1) creating pipelines and 2) handling ack, based on factors like 
the data length of the packet, processing time with last ack timestamp, whether 
flush to replicas is finished etc. If it can utilize slownode API from HDFS to 
exclude nodes appropriately while writing block, a lot of it's own post-ack 
computation of slow nodes can be _saved_ or _improved_ or based on further 
experiment, we could find _better solution_ to manage slow node detection logic 
both in HDFS and HBase. However, in order to collect more data points and run 
more POC around this area, at least we should expect HDFS to provide API for 
downstreamers to efficiently utilize slownode info for such critical 
low-latency use-case (like writing WALs).


> DFS API to retrieve slow datanodes
> ----------------------------------
>
>                 Key: HDFS-16521
>                 URL: https://issues.apache.org/jira/browse/HDFS-16521
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> Providing DFS API to retrieve slow nodes would help add an additional option 
> to "dfsadmin -report" that lists slow datanodes info for operators to take a 
> look, specifically useful filter for larger clusters.
> The other purpose of such API is for HDFS downstreamers without direct access 
> to namenode http port (only rpc port accessible) to retrieve slownodes.
> Moreover, 
> [FanOutOneBlockAsyncDFSOutput|https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutput.java]
>  in HBase currently has to rely on it's own way of marking and excluding slow 
> nodes while 1) creating pipelines and 2) handling ack, based on factors like 
> the data length of the packet, processing time with last ack timestamp, 
> whether flush to replicas is finished etc. If it can utilize slownode API 
> from HDFS to exclude nodes appropriately while writing block, a lot of it's 
> own post-ack computation of slow nodes can be _saved_ or _improved_ or based 
> on further experiment, we could find _better solution_ to manage slow node 
> detection logic both in HDFS and HBase. However, in order to collect more 
> data points and run more POC around this area, HDFS should provide API for 
> downstreamers to efficiently utilize slownode info for such critical 
> low-latency use-case (like writing WALs).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to