[
https://issues.apache.org/jira/browse/HDFS-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Colin Patrick McCabe resolved HDFS-9955.
----------------------------------------
Resolution: Duplicate
> DataNode won't self-heal after some block dirs were manually misplaced
> ----------------------------------------------------------------------
>
> Key: HDFS-9955
> URL: https://issues.apache.org/jira/browse/HDFS-9955
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.6.0
> Environment: CentOS 6, Cloudera 5.4.4 (patched Hadoop 2.6.0)
> Reporter: David Watzke
> Labels: data-integrity
>
> I have accidentally ran this tool on top of DataNode's datadirs (of a
> datanode that was shut down at the moment):
> https://github.com/killerwhile/volume-balancer
> The tool makes assumptions about block directory placement that are no longer
> valid in hadoop 2.6.0 and it was just moving them around between different
> datadirs to make the disk usage balanced. OK, it was not a good idea to run
> it but my concern is the way the datanode was (not) handling the resulting
> state. I've seen these messages in DN log (see below) which means DN knew
> about this but didn't do anything to fix it (self-heal by copying the other
> replica) - which seems like a bug to me. If you need any additional info
> please just ask.
> {noformat}
> 2016-03-04 12:40:06,008 WARN
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding
> block BP-680964103-A.B.C.D-1375882473930:blk_-3159875140074863904_0 on volume
> /data/18/cdfs/dn
> 2016-03-04 12:40:06,009 WARN
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding
> block BP-680964103-A.B.C.D-1375882473930:blk_8369468090548520777_0 on volume
> /data/18/cdfs/dn
> 2016-03-04 12:40:06,011 WARN
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding
> block BP-680964103-A.B.C.D-1375882473930:blk_1226431637_0 on volume
> /data/18/cdfs/dn
> 2016-03-04 12:40:06,012 WARN
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding
> block BP-680964103-A.B.C.D-1375882473930:blk_1169332185_0 on volume
> /data/18/cdfs/dn
> 2016-03-04 12:40:06,825 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> opReadBlock BP-680964103-A.B.C.D-1375882473930:blk_1226781281_1099829669050
> received exception java.io.IOException: BlockId 1226781281 is not valid.
> 2016-03-04 12:40:06,825 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(X.Y.Z.30,
> datanodeUuid=9da950ca-87ae-44ee-9391-0bca669c796b, infoPort=50075,
> ipcPort=50020,
> storageInfo=lv=-56;cid=cluster12;nsid=1625487778;c=1438754073236):Got
> exception while serving
> BP-680964103-A.B.C.D-1375882473930:blk_1226781281_1099829669050 to
> /X.Y.Z.30:48146
> java.io.IOException: BlockId 1226781281 is not valid.
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:650)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:641)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:214)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:282)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:529)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:243)
> at java.lang.Thread.run(Thread.java:745)
> 2016-03-04 12:40:06,826 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> prg04-002.xyz.tld:50010:DataXceiver error processing READ_BLOCK operation
> src: /X.Y.Z.30:48146 dst: /X.Y.Z.30:50010
> java.io.IOException: BlockId 1226781281 is not valid.
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:650)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:641)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:214)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:282)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:529)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:243)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)