[
https://issues.apache.org/jira/browse/HADOOP-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564543#action_12564543
]
Raghu Angadi commented on HADOOP-2346:
--------------------------------------
If remember correctly many cases that Koji noticed had DFSClient and one or
more datanodes were stuck on socket writes (while writing new blocks). What
happens is that when a datanode is writing block data to local disk, disk write
gets stuck forever (due to various kernel problems... that mount might have
spontaneously become read-only), this blocked write to disk essentially stalls
write pipeline from that datanode up to DFSClient.
This is pre HADOOP-1707. I am not sure same condition still stalls the write
pipeline, it might.
> DataNode should have timeout on socket writes.
> ----------------------------------------------
>
> Key: HADOOP-2346
> URL: https://issues.apache.org/jira/browse/HADOOP-2346
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.15.1
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.16.1
>
> Attachments: HADOOP-2346.patch, HADOOP-2346.patch
>
>
> If a client opens a file and stops reading in the middle, DataNode thread
> writing the data could be stuck forever. For DataNode sockets we set read
> timeout but not write timeout. I think we should add a write(data, timeout)
> method in IOUtils that assumes it the underlying FileChannel is non-blocking.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.