[
https://issues.apache.org/jira/browse/HDFS-10667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394998#comment-15394998
]
Yongjun Zhang commented on HDFS-10667:
--------------------------------------
Hi [~yuanbo],
I was about to commit the patch and had one thought here. In the current patch,
the new parameter {{header}} was added to {{verifyChunks(dataBuf,
checksumBuf);}} just to make the info ready when an error is found, which is a
waste since this parameter is not needed in normal run. Actually it is not
necessary since we can make the following call:
{code}
// It's required that the packetReceriver here is the same one used to get
// the two parameters dataBuf and checksumBuf passed to this
// method.
PacketHeader header = packetReceiver.getHeader();
{code}
inside the error handling code in the exception handling code of
{{verifyChunks}}.
This means {{verifyChunks}} has the requirement that the {{header}} got like I
suggested above need to be consistent with the other two parameters passed to
it. This is ok, since {{verifyChunks}} is a private method. That's why I added
the above comments in the suggested change.
What do you think? thanks.
> Report more accurate info about data corruption location
> --------------------------------------------------------
>
> Key: HDFS-10667
> URL: https://issues.apache.org/jira/browse/HDFS-10667
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode, hdfs
> Reporter: Yongjun Zhang
> Assignee: Yuanbo Liu
> Attachments: HDFS-10667.001.patch, HDFS-10667.002.patch,
> HDFS-10667.003.patch
>
>
> Per
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15376897&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15376897
> 129.77 report:
> {code}
> 2016-07-13 11:49:01,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Receiving blk_1116167880_42906656 src: /10.6.134.229:43844 dest:
> /10.6.129.77:5080
> 2016-07-13 11:49:01,543 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Checksum error in block blk_1116167880_42906656 from /10.6.134.229:43844
> org.apache.hadoop.fs.ChecksumException: Checksum error:
> DFSClient_NONMAPREDUCE_2019484565_1 at 81920 exp: 1352119728 got: -1012279895
> at
> org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native
> Method)
> at
> org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
> at
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
> at
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:421)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:558)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
> at java.lang.Thread.run(Thread.java:745)
> 2016-07-13 11:49:01,543 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Exception for blk_1116167880_42906656
> java.io.IOException: Terminating due to a checksum error.java.io.IOException:
> Unexpected checksum mismatch while writing blk_1116167880_42906656 from
> /10.6.134.229:43844
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:571)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> and
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15378879&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15378879
> {quote}
> While verifying only packet, the position mentioned in the checksum
> exception, is relative to packet buffer offset, not the block offset. So
> 81920 is the offset in the exception.
> {quote}
> Create this jira to report more accurate corruption location information: the
> offset in the file, offset in block, and offset in packet.
> See
> https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15387083&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15387083
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]