[
https://issues.apache.org/jira/browse/HDFS-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Junping Du updated HDFS-9558:
-----------------------------
Target Version/s: (was: 2.8.0)
> Replication requests from datanode always blames the source datanode in case
> of Checksum Exception.
> ---------------------------------------------------------------------------------------------------
>
> Key: HDFS-9558
> URL: https://issues.apache.org/jira/browse/HDFS-9558
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Rushabh S Shah
>
> Replication requests from datanode (in case of rack failure event) always
> blames the source datanode if any of the downstream nodes encounters
> ChecksumException.
> We saw this case recently in our cluster.
> We lost 7 nodes in a rack.
> There was only one replica of the block (say on dnA).
> The namenode asks dnA to replicate to dnB and dnC.
> {noformat}
> 2015-12-13 21:09:41,798 [DataNode: heartbeating to NN:8020] INFO
> datanode.DataNode: DatanodeRegistration(dnA,
> datanodeUuid=bc1f183d-b74a-49c9-ab1a-d1d496ab77e9, infoPort=1006,
> infoSecurePort=0, ipcPort=8020,
> storageInfo=lv=-56;cid=CID-e7f736ac-158e-446e-9091-7e66f3cddf3c;nsid=358250775;c=1428471998571)
> Starting thread to transfer
> BP-1620678153-XXXX-1351096255769:blk_3065507810_1107476861617 to dnB:1004
> dnC:1004
> {noformat}
> All the packets going out from dnB's interface were getting corrupted.
> So dnC received corrupt block and it reported bad block (from dnA) to
> namenode.
> Following are the logs from dnC:
> {noformat}
> 2015-12-13 21:09:43,444 [DataXceiver for client at /dnB:34879 [Receiving
> block BP-1620678153-XXXX-1351096255769:blk_3065507810_1107476861617]] WARN
> datanode.DataNode: Checksum error in block
> BP-1620678153-XXXX-1351096255769:blk_3065507810_1107476861617 from /dnB:34879
> org.apache.hadoop.fs.ChecksumException: Checksum error: at 58368 exp:
> -1657951272 got: 856104973
> at
> org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native
> Method)
> at
> org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
> at
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
> at
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:416)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:550)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:853)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:761)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:237)
> at java.lang.Thread.run(Thread.java:745)
> 2015-12-13 21:09:43,445 [DataXceiver for client at dnB:34879 [Receiving
> block BP-1620678153-XXXX-1351096255769:blk_3065507810_1107476861617]] INFO
> datanode.DataNode: report corrupt
> BP-1620678153-XXXX-1351096255769:blk_3065507810_1107476861617 from datanode
> dnA:1004 to namenode
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]