[jira] [Updated] (HDFS-915) Hung DN stalls write pipeline for far longer than its timeout

Todd Lipcon (Updated) (JIRA) Wed, 14 Mar 2012 13:13:02 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Todd Lipcon updated HDFS-915:
-----------------------------

    Attachment: hdfs-915-0.20.txt

Here's a patch that we've tested for a long time in an 0.20-based build. We 
need to re-investigate this to see if it's still relevant for branch-1 and 
trunk, as well as add a test case.
                
> Hung DN stalls write pipeline for far longer than its timeout
> -------------------------------------------------------------
>
>                 Key: HDFS-915
>                 URL: https://issues.apache.org/jira/browse/HDFS-915
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 0.20.1
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-915-0.20.txt, local-dn.log
>
>
> After running kill -STOP on the datanode in the middle of a write pipeline, 
> the client takes far longer to recover than it should. The ResponseProcessor 
> times out in the correct interval, but doesn't interrupt the DataStreamer, 
> which appears to not be subject to the same timeout. The client only recovers 
> once the OS actually declares the TCP stream dead, which can take a very long 
> time.
> I've experienced this on 0.20.1, haven't tried it yet on trunk or 0.21.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-915) Hung DN stalls write pipeline for far longer than its timeout

Reply via email to