[ 
https://issues.apache.org/jira/browse/HDFS-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427320#comment-13427320
 ] 

Robert Joseph Evans commented on HDFS-3751:
-------------------------------------------

If we are collecting this data to be able to output a warning it would be good 
to also keep metrics for each disk.  This would potentially give us the ability 
in the future to have an admin look at the disk metrics and look for outliers.  
They could then investigate further and possible remove the failing disk.
                
> DN should log warnings for lengthy disk IOs
> -------------------------------------------
>
>                 Key: HDFS-3751
>                 URL: https://issues.apache.org/jira/browse/HDFS-3751
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>    Affects Versions: 1.2.0, 2.1.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Colin Patrick McCabe
>
> Occasionally failing disks or other OS-and-below issues cause a single IO to 
> take tens of seconds, or even minutes in the case of failures. This often 
> results in timeout exceptions at the client side which are hard to diagnose. 
> It would be easier to root-cause these issues if the DN logged a WARN like 
> "IO of 64kb to volume /data/1/dfs/dn for block 12345234 client 1.2.3.4 took 
> 61.3 seconds" or somesuch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to