[
https://issues.apache.org/jira/browse/HDFS-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445350#comment-13445350
]
Todd Lipcon commented on HDFS-3874:
-----------------------------------
The bug seems to be that the datanode doesn't report the right remote DN when
it detects a checksum error when receiving a block. Here are the DN side logs:
{code}
2012-08-27 16:34:30,396 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
Checksum error in block
BP-1507505631-172.29.97.196-1337120439433:blk_8285012733733669474_140475196
from /172.29.97.219:52544
org.apache.hadoop.fs.ChecksumException: Checksum error:
DFSClient_NONMAPREDUCE_334070927_1 at 44032 exp: -983390667 got: 557443094
at
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:335)
at
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:266)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:377)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:496)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:506)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
at java.lang.Thread.run(Thread.java:662)
2012-08-27 16:34:30,396 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
report corrupt block
BP-1507505631-172.29.97.196-1337120439433:blk_8285012733733669474_140475196
from datanode :0 to namenode
{code}
> Exception when client reports bad checksum to NN
> ------------------------------------------------
>
> Key: HDFS-3874
> URL: https://issues.apache.org/jira/browse/HDFS-3874
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs client, name-node
> Affects Versions: 2.0.0-alpha
> Reporter: Todd Lipcon
>
> We see the following exception in our logs on a cluster:
> {code}
> 2012-08-27 16:34:30,400 INFO org.apache.hadoop.hdfs.StateChange: *DIR*
> NameNode.reportBadBlocks
> 2012-08-27 16:34:30,400 ERROR
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hdfs (auth:SIMPLE) cause:java.io.IOException: Cannot mark
> blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored)
> as corrupt because datanode :0 does not exist
> 2012-08-27 16:34:30,400 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 46 on 8020, call
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.reportBadBlocks from
> 172.29.97.219:43805: error: java.io.IOException: Cannot mark
> blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored)
> as corrupt because datanode :0 does not exist
> java.io.IOException: Cannot mark
> blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored)
> as corrupt because datanode :0 does not exist
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.markBlockAsCorrupt(BlockManager.java:1001)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.findAndMarkBlockAsCorrupt(BlockManager.java:994)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.reportBadBlocks(FSNamesystem.java:4736)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.reportBadBlocks(NameNodeRpcServer.java:537)
> at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.reportBadBlocks(DatanodeProtocolServerSideTranslatorPB.java:242)
> at
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:20032)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira