[jira] [Commented] (HDFS-3751) DN should log warnings for lengthy disk IOs

Todd Lipcon (JIRA) Thu, 02 Aug 2012 09:30:03 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427421#comment-13427421
 ]


Todd Lipcon commented on HDFS-3751:
-----------------------------------

Hey Bobby. We recently added metrics for these timings (HDFS-3170) and now 
calculate quantiles for them as well (HDFS-3650). I agree it would be nice to 
track them dynamically per mount, but I think that's a bit more complicated 
than the simple warning proposed here.

We used a hacked up version of this proposed patch on a customer workload, and 
even the really simple logging was super helpful. Most people already have a 
way of grepping logs for certain key warning messages to trigger alerts, so 
even without Hadoop-side support for aggregating and counting the metrics, I 
think this should go in. Then let's file a separate JIRA to collect per-disk 
metrics using the metrics2 dynamic metrics support.
                
> DN should log warnings for lengthy disk IOs
> -------------------------------------------
>
>                 Key: HDFS-3751
>                 URL: https://issues.apache.org/jira/browse/HDFS-3751
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>    Affects Versions: 1.2.0, 2.1.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Colin Patrick McCabe
>
> Occasionally failing disks or other OS-and-below issues cause a single IO to 
> take tens of seconds, or even minutes in the case of failures. This often 
> results in timeout exceptions at the client side which are hard to diagnose. 
> It would be easier to root-cause these issues if the DN logged a WARN like 
> "IO of 64kb to volume /data/1/dfs/dn for block 12345234 client 1.2.3.4 took 
> 61.3 seconds" or somesuch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3751) DN should log warnings for lengthy disk IOs

Reply via email to