[ 
https://issues.apache.org/jira/browse/HADOOP-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564543#action_12564543
 ] 

Raghu Angadi commented on HADOOP-2346:
--------------------------------------


If remember correctly many cases that Koji noticed had DFSClient and one or 
more datanodes were stuck on socket writes (while writing new blocks). What 
happens is that when a datanode is writing block data to local disk, disk write 
gets stuck forever (due to various kernel problems... that mount might have 
spontaneously become read-only), this blocked write to disk essentially stalls 
write pipeline from that datanode up to DFSClient.

This is pre HADOOP-1707. I am not sure same condition still stalls the write 
pipeline, it might.


> DataNode should have timeout on socket writes.
> ----------------------------------------------
>
>                 Key: HADOOP-2346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.1
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.16.1
>
>         Attachments: HADOOP-2346.patch, HADOOP-2346.patch
>
>
> If a client opens a file and stops reading in the middle, DataNode thread 
> writing the data could be stuck forever. For DataNode sockets we set read 
> timeout but not write timeout. I think we should add a write(data, timeout) 
> method in IOUtils that assumes it the underlying FileChannel is non-blocking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to