[
https://issues.apache.org/jira/browse/HDFS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313007#comment-15313007
]
Wei-Chiu Chuang commented on HDFS-6937:
---------------------------------------
I am taking over Yongjun's patch because he'll not be able to access Internet
for some time.
This is a great work and I took some time to understand. I think that instead
of throwing an IOException to simulate the injection of checksum failure at the
last datanode, it should enqueue a ERROR_CHECKSUM to indicate the checksum
failure. Without it, the last DN will shutdown the connection, and the second
DN in the pipeline will not understand it's checksum failure.
{code:title=BlockReceiver.java#sendAckUpstreamUnprotected}
if (ack == null) {
// A new OOB response is being sent from this node. Regardless of
// downstream nodes, reply should contain one reply.
replies = new int[] { myHeader };
} else if (mirrorError) { // ack read error
int h = PipelineAck.combineHeader(datanode.getECN(), Status.SUCCESS);
int h1 = PipelineAck.combineHeader(datanode.getECN(), Status.ERROR);
replies = new int[] {h, h1};
} else {
short ackLen = type == PacketResponderType.LAST_IN_PIPELINE ? 0 : ack
.getNumOfReplies();
replies = new int[ackLen + 1];
replies[0] = myHeader;
for (int i = 0; i < ackLen; ++i) {
replies[i + 1] = ack.getHeaderFlag(i);
}
// If the mirror has reported that it received a corrupt packet,
// do self-destruct to mark myself bad, instead of making the
// mirror node bad. The mirror is guaranteed to be good without
// corrupt data on disk.
if (ackLen > 0 && PipelineAck.getStatusFromHeader(replies[1]) ==
Status.ERROR_CHECKSUM) {
throw new IOException("Shutting down writer and responder "
+ "since the down streams reported the data sent by this "
+ "thread is corrupt");
}
}
{code}
In this piece of code, if the next DN shutdown the connection, it is always
assumed the local DN is good.
{code}
int h = PipelineAck.combineHeader(datanode.getECN(), Status.SUCCESS);
int h1 = PipelineAck.combineHeader(datanode.getECN(), Status.ERROR);
replies = new int[] {h, h1};
{code}
On the other hand, if the next DN respond with a ERROR_CHECKSUM, it will thrown
an IOException, and this will shutdown the connection with the previous DN in
the pipeline. In the end, this will replace the middle datanode:
{code:title=DataStreamer.java#createBlockOutputStream}
// find the datanode that matches
if (firstBadLink.length() != 0) {
for (int i = 0; i < nodes.length; i++) {
// NB: Unconditionally using the xfer addr w/o hostname
if (firstBadLink.equals(nodes[i].getXferAddr())) {
errorState.setBadNodeIndex(i);
break;
}
}
}
{code}
> Another issue in handling checksum errors in write pipeline
> -----------------------------------------------------------
>
> Key: HDFS-6937
> URL: https://issues.apache.org/jira/browse/HDFS-6937
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, hdfs-client
> Affects Versions: 2.5.0
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
> Attachments: HDFS-6937.001.patch, HDFS-6937.002.patch
>
>
> Given a write pipeline:
> DN1 -> DN2 -> DN3
> DN3 detected cheksum error and terminate, DN2 truncates its replica to the
> ACKed size. Then a new pipeline is attempted as
> DN1 -> DN2 -> DN4
> DN4 detects checksum error again. Later when replaced DN4 with DN5 (and so
> on), it failed for the same reason. This led to the observation that DN2's
> data is corrupted.
> Found that the software currently truncates DN2's replca to the ACKed size
> after DN3 terminates. But it doesn't check the correctness of the data
> already written to disk.
> So intuitively, a solution would be, when downstream DN (DN3 here) found
> checksum error, propagate this info back to upstream DN (DN2 here), DN2
> checks the correctness of the data already written to disk, and truncate the
> replica to to MIN(correctDataSize, ACKedSize).
> Found this issue is similar to what was reported by HDFS-3875, and the
> truncation at DN2 was actually introduced as part of the HDFS-3875 solution.
> Filing this jira for the issue reported here. HDFS-3875 was filed by
> [~tlipcon]
> and found he proposed something similar there.
> {quote}
> if the tail node in the pipeline detects a checksum error, then it returns a
> special error code back up the pipeline indicating this (rather than just
> disconnecting)
> if a non-tail node receives this error code, then it immediately scans its
> own block on disk (from the beginning up through the last acked length). If
> it detects a corruption on its local copy, then it should assume that it is
> the faulty one, rather than the downstream neighbor. If it detects no
> corruption, then the faulty node is either the downstream mirror or the
> network link between the two, and the current behavior is reasonable.
> {quote}
> Thanks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]