[
https://issues.apache.org/jira/browse/HDFS-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-915:
-----------------------------
Attachment: hdfs-915-0.20.txt
Here's a patch that we've tested for a long time in an 0.20-based build. We
need to re-investigate this to see if it's still relevant for branch-1 and
trunk, as well as add a test case.
> Hung DN stalls write pipeline for far longer than its timeout
> -------------------------------------------------------------
>
> Key: HDFS-915
> URL: https://issues.apache.org/jira/browse/HDFS-915
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs client
> Affects Versions: 0.20.1
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hdfs-915-0.20.txt, local-dn.log
>
>
> After running kill -STOP on the datanode in the middle of a write pipeline,
> the client takes far longer to recover than it should. The ResponseProcessor
> times out in the correct interval, but doesn't interrupt the DataStreamer,
> which appears to not be subject to the same timeout. The client only recovers
> once the OS actually declares the TCP stream dead, which can take a very long
> time.
> I've experienced this on 0.20.1, haven't tried it yet on trunk or 0.21.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira