[
https://issues.apache.org/jira/browse/HDFS-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403459#comment-15403459
]
Wei-Chiu Chuang commented on HDFS-8224:
---------------------------------------
Incidentally, we just saw the same stack trace "java.io.IOException: Could not
create DataChecksum of type 0 with bytesPerChecksum 0" in a cluster. I wonder
if it's caused by a software (HDFS) bug rather than a hardware failure.
Based on my understanding, the stacktrace suggests it is the meta file of the
replica that's corrupt (its header has invalid values: type 0 and
bytesPerChecksum 0). It might be possible the replica itself is actually good.
> Any IOException in DataTransfer#run() will run diskError thread even if it is
> not disk error
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-8224
> URL: https://issues.apache.org/jira/browse/HDFS-8224
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Rushabh S Shah
> Assignee: Rushabh S Shah
> Fix For: 2.8.0
>
>
> This happened in our 2.6 cluster.
> One of the block and its metadata file were corrupted.
> The disk was healthy in this case.
> Only the block was corrupt.
> Namenode tried to copy that block to another datanode but failed with the
> following stack trace:
> 2015-04-20 01:04:04,421
> [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@11319bc4] WARN
> datanode.DataNode: DatanodeRegistration(a.b.c.d,
> datanodeUuid=e8c5135c-9b9f-4d05-a59d-e5525518aca7, infoPort=1006,
> infoSecurePort=0, ipcPort=8020,
> storageInfo=lv=-56;cid=CID-e7f736ac-158e-446e-9091-7e66f3cddf3c;nsid=358250775;c=1428471998571):Failed
> to transfer BP-xxx-1351096255769:blk_2697560713_1107108863999 to
> a1.b1.c1.d1:1004 got
> java.io.IOException: Could not create DataChecksum of type 0 with
> bytesPerChecksum 0
> at
> org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:125)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:175)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:140)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readDataChecksum(BlockMetadataHeader.java:102)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:287)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1989)
> at java.lang.Thread.run(Thread.java:722)
> The following catch block in DataTransfer#run method will treat every
> IOException as disk error fault and run disk errror
> {noformat}
> catch (IOException ie) {
> LOG.warn(bpReg + ":Failed to transfer " + b + " to " +
> targets[0] + " got ", ie);
> // check if there are any disk problem
> checkDiskErrorAsync();
> }
> {noformat}
> This block was never scanned by BlockPoolSliceScanner otherwise it would have
> reported as corrupt block.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]