[jira] Commented: (HADOOP-3234) Write pipeline does not recover from first node failure sometimes.

Raghu Angadi (JIRA) Fri, 11 Apr 2008 14:36:59 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588135#action_12588135
 ]


Raghu Angadi commented on HADOOP-3234:
--------------------------------------

This is not an issue with non-blocking I/O. Looks like read and write using 
regular sockets is not interruptible (really?). So this will be a very rare 
problem when  HADOOP-3124 is committed and "dfs.datanode.socket.write.timeout" 
is set to 0 and something like HADOOP-3132 happens. On 16, it not much of an 
issue since there is no write timeout at all.

> Write pipeline does not recover from first node failure sometimes.
> ------------------------------------------------------------------
>
>                 Key: HADOOP-3234
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3234
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>            Priority: Blocker
>             Fix For: 0.17.0
>
>
> While investigating HADOOP-3132, we had a misconfiguration that resulted in 
> client writing to first datanode in the pipeline with 15 second write 
> timeout. As a result, client breaks the pipeline marking the first datanode 
> (DN1) as the bad node. It then restarts the next pipeline with the rest of 
> the of the datanodes. But the next (second) datanode was stuck waiting for 
> the the earlier block-write to complete. So the client repeats this procedure 
> until it runs out the datanodes and client write fails.
> I think this should be a blocker either for 0.16 or 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3234) Write pipeline does not recover from first node failure sometimes.

Reply via email to