[
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792285#action_12792285
]
Todd Lipcon commented on HDFS-101:
----------------------------------
Just applied
https://issues.apache.org/jira/secure/attachment/12428383/detectDownDN1-0.20.patch
and tested on the cluster. I think the other error I mentioned above is just
HDFS-630, since I'm testing on 0.20 on a 3-node cluster, so +1 on this patch.
bq. clientName.len == 0 means that this is a block copy for replication. It has
nothing to do if this is the last DN in pipeline or not.
Right, but my question is whether clientName.len can ever be 0 when there's a
mirror. My belief is no. Perhaps it's worth an assert there (since we're now
cool with assertions in HDFS)
> DFS write pipeline : DFSClient sometimes does not detect second datanode
> failure
> ---------------------------------------------------------------------------------
>
> Key: HDFS-101
> URL: https://issues.apache.org/jira/browse/HDFS-101
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 0.20.1
> Reporter: Raghu Angadi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: detectDownDN-0.20.patch, detectDownDN1-0.20.patch,
> detectDownDN2.patch, hdfs-101.tar.gz
>
>
> When the first datanode's write to second datanode fails or times out
> DFSClient ends up marking first datanode as the bad one and removes it from
> the pipeline. Similar problem exists on DataNode as well and it is fixed in
> HADOOP-3339. From HADOOP-3339 :
> "The main issue is that BlockReceiver thread (and DataStreamer in the case of
> DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty
> coarse control. We don't know what state the responder is in and interrupting
> has different effects depending on responder state. To fix this properly we
> need to redesign how we handle these interactions."
> When the first datanode closes its socket from DFSClient, DFSClient should
> properly read all the data left in the socket.. Also, DataNode's closing of
> the socket should not result in a TCP reset, otherwise I think DFSClient will
> not be able to read from the socket.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.