[ 
https://issues.apache.org/jira/browse/HDFS-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-9558:
-----------------------------
    Target Version/s:   (was: 2.8.0)

> Replication requests from datanode always blames the source datanode in case 
> of Checksum Exception.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9558
>                 URL: https://issues.apache.org/jira/browse/HDFS-9558
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Rushabh S Shah
>
> Replication requests from datanode (in case of rack failure event) always 
> blames the source datanode if any of the downstream nodes encounters 
> ChecksumException.
> We saw this case recently in our cluster.
> We lost  7 nodes in a rack.
> There was only one replica of the block (say on dnA).
> The namenode asks dnA to replicate to dnB and dnC.
> {noformat}
> 2015-12-13 21:09:41,798 [DataNode:   heartbeating to NN:8020] INFO 
> datanode.DataNode: DatanodeRegistration(dnA, 
> datanodeUuid=bc1f183d-b74a-49c9-ab1a-d1d496ab77e9, infoPort=1006, 
> infoSecurePort=0, ipcPort=8020, 
> storageInfo=lv=-56;cid=CID-e7f736ac-158e-446e-9091-7e66f3cddf3c;nsid=358250775;c=1428471998571)
>  Starting thread to transfer 
> BP-1620678153-XXXX-1351096255769:blk_3065507810_1107476861617 to dnB:1004 
> dnC:1004 
> {noformat}
> All the packets going out from dnB's interface were getting corrupted.
> So dnC  received corrupt block and it reported bad block (from dnA) to 
> namenode.
> Following are the logs from dnC:
> {noformat}
> 2015-12-13 21:09:43,444 [DataXceiver for client  at /dnB:34879 [Receiving 
> block BP-1620678153-XXXX-1351096255769:blk_3065507810_1107476861617]] WARN 
> datanode.DataNode: Checksum error in block 
> BP-1620678153-XXXX-1351096255769:blk_3065507810_1107476861617 from /dnB:34879
> org.apache.hadoop.fs.ChecksumException: Checksum error:  at 58368 exp: 
> -1657951272 got: 856104973
>         at 
> org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native 
> Method)
>         at 
> org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
>         at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
>         at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:416)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:550)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:853)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:761)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:237)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-12-13 21:09:43,445 [DataXceiver for client  at dnB:34879 [Receiving 
> block BP-1620678153-XXXX-1351096255769:blk_3065507810_1107476861617]] INFO 
> datanode.DataNode: report corrupt 
> BP-1620678153-XXXX-1351096255769:blk_3065507810_1107476861617 from datanode 
> dnA:1004 to namenode
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to