[ 
https://issues.apache.org/jira/browse/HADOOP-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588135#action_12588135
 ] 

rangadi edited comment on HADOOP-3234 at 4/11/08 2:33 PM:
---------------------------------------------------------------

This is not an issue while using non-blocking I/O. Looks like read and write 
using regular sockets is not interruptible (really?). So this will be a very 
rare problem when  HADOOP-3124 is committed and 
"dfs.datanode.socket.write.timeout" is set to 0 and something like HADOOP-3132 
happens. On 16, it not much of an issue since there is no write timeout at all.

      was (Author: rangadi):
    This is not an issue with non-blocking I/O. Looks like read and write using 
regular sockets is not interruptible (really?). So this will be a very rare 
problem when  HADOOP-3124 is committed and "dfs.datanode.socket.write.timeout" 
is set to 0 and something like HADOOP-3132 happens. On 16, it not much of an 
issue since there is no write timeout at all.
  
> Write pipeline does not recover from first node failure sometimes.
> ------------------------------------------------------------------
>
>                 Key: HADOOP-3234
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3234
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>            Priority: Blocker
>             Fix For: 0.17.0
>
>
> While investigating HADOOP-3132, we had a misconfiguration that resulted in 
> client writing to first datanode in the pipeline with 15 second write 
> timeout. As a result, client breaks the pipeline marking the first datanode 
> (DN1) as the bad node. It then restarts the next pipeline with the rest of 
> the of the datanodes. But the next (second) datanode was stuck waiting for 
> the the earlier block-write to complete. So the client repeats this procedure 
> until it runs out the datanodes and client write fails.
> I think this should be a blocker either for 0.16 or 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to