[
https://issues.apache.org/jira/browse/HDFS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762466#action_12762466
]
Kan Zhang commented on HDFS-564:
--------------------------------
I've attached 2 preliminary patches.
h564-24.patch is a patch for the pre-append-merge trunk. This patch changes the
behavior of BlockReciever.java in 2 ways. 1) When downstream error happens
(ends up in handleMirrorOutError()), the receiver thread no longer interrupts
the responder thread, which may lead the responder to behave as if local error
has occurred and give the wrong idea to upstream node. 2) The responder will
try to read all downstream statuses (up to first ERROR status) before sending
its own status and forwarding others to upstream node. If the responder fails
to read all downstream statuses it needs, it will mark the next downstream
datanode as ERROR.
h564-24.patch implements all the tests except 26-28 and 31-33. In the case of
test 26-28, I've seen intermittent failures similar to those described in
HDFS-101, i.e., when the first datanode sends all statuses to DFSClient and
closes the socket, DFSClient isn't able to read those statuses and instead gets
a TCP reset. As a result, DFSClient will mistakenly consider the first datanode
at fault. In the case of test 31-33, the DFSClient will keep receiving seqno
== -1 (keep alive) and hang.
h564-25.patch is a quick port of h564-24.patch to the current post-append-merge
trunk. Unfortunately many of the tests are failing and I may not have time to
investigate it. Hopefully, someone can pick it up from here.
> Adding pipeline test 17-35
> --------------------------
>
> Key: HDFS-564
> URL: https://issues.apache.org/jira/browse/HDFS-564
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: test
> Reporter: Kan Zhang
> Assignee: Kan Zhang
> Attachments: h564-24.patch, h564-25.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.