[
https://issues.apache.org/jira/browse/HADOOP-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588135#action_12588135
]
Raghu Angadi commented on HADOOP-3234:
--------------------------------------
This is not an issue with non-blocking I/O. Looks like read and write using
regular sockets is not interruptible (really?). So this will be a very rare
problem when HADOOP-3124 is committed and "dfs.datanode.socket.write.timeout"
is set to 0 and something like HADOOP-3132 happens. On 16, it not much of an
issue since there is no write timeout at all.
> Write pipeline does not recover from first node failure sometimes.
> ------------------------------------------------------------------
>
> Key: HADOOP-3234
> URL: https://issues.apache.org/jira/browse/HADOOP-3234
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Priority: Blocker
> Fix For: 0.17.0
>
>
> While investigating HADOOP-3132, we had a misconfiguration that resulted in
> client writing to first datanode in the pipeline with 15 second write
> timeout. As a result, client breaks the pipeline marking the first datanode
> (DN1) as the bad node. It then restarts the next pipeline with the rest of
> the of the datanodes. But the next (second) datanode was stuck waiting for
> the the earlier block-write to complete. So the client repeats this procedure
> until it runs out the datanodes and client write fails.
> I think this should be a blocker either for 0.16 or 0.17.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.