[
https://issues.apache.org/jira/browse/HDFS-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548459#comment-16548459
]
Wei-Chiu Chuang commented on HDFS-8533:
---------------------------------------
I've been trying to reproduce this bug following the description in this jira,
without success.
Here's my test code for future reference:
{code:java}
/** check if nn.getCorruptFiles() returns a file that has corrupted blocks */
@Test (timeout=300000)
public void testListCorruptFilesCorruptedBlock2() throws Exception {
MiniDFSCluster cluster = null;
Random random = new Random();
try {
Configuration conf = new HdfsConfiguration();
conf.setInt(DFSConfigKeys.DFS_DATANODE_DIRECTORYSCAN_INTERVAL_KEY, 1); //
datanode scans directories
conf.setInt(DFSConfigKeys.DFS_BLOCKREPORT_INTERVAL_MSEC_KEY, 3 * 1000);
// datanode sends block reports
// Set short retry timeouts so this test runs faster
conf.setInt(DFSConfigKeys.DFS_CLIENT_RETRY_WINDOW_BASE, 10);
// start 2 DNs
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(2).build();
FileSystem fs = cluster.getFileSystem();
// create two files with one block each
DFSTestUtil util = new DFSTestUtil.Builder().
setName("testCorruptFilesCorruptedBlock").setNumFiles(2).
setMaxLevels(1).setMaxSize(512).build();
util.createFiles(fs, "/srcdat10");
// Now deliberately corrupt one block
String bpid = cluster.getNamesystem().getBlockPoolId();
File storageDir = cluster.getInstanceStorageDir(0, 1);
File data_dir = MiniDFSCluster.getFinalizedDir(storageDir, bpid);
assertTrue("data directory does not exist", data_dir.exists());
List<File> metaFiles = MiniDFSCluster.getAllBlockFiles(data_dir);
assertTrue("Data directory does not contain any blocks or there was an "
+ "IO error", metaFiles != null && !metaFiles.isEmpty());
File metaFile = metaFiles.get(0);
RandomAccessFile file = new RandomAccessFile(metaFile, "rw");
FileChannel channel = file.getChannel();
long position = channel.size() - 2;
int length = 2;
byte[] buffer = new byte[length];
random.nextBytes(buffer);
channel.write(ByteBuffer.wrap(buffer), position);
file.close();
LOG.info("Deliberately corrupting file " + metaFile.getName() +
" at offset " + position + " length " + length);
// read all files to trigger detection of corrupted replica
try {
util.checkFiles(fs, "/srcdat10");
} catch (BlockMissingException e) {
System.out.println("Received BlockMissingException as expected.");
} catch (IOException e) {
assertTrue("Corrupted replicas not handled properly. Expecting
BlockMissingException " +
" but received IOException " + e, false);
}
LOG.info("Restarting Datanode to trigger BlockPoolSliceScanner");
cluster.restartDataNodes();
cluster.waitActive();
cluster.stopDataNode(1);
// fetch bad file list from namenode. There should be one file.
final NameNode namenode = cluster.getNameNode();
while (cluster.getNamesystem().getBlockManager().getMissingBlocksCount()
== 0) {
Thread.sleep(1000);
LOG.info("Still waiting for missing block");
}
assertEquals(1,
cluster.getNamesystem().getBlockManager().getMissingBlocksCount());
Collection<FSNamesystem.CorruptFileBlockInfo> badFiles =
namenode.getNamesystem().listCorruptFileBlocks("/", null);
LOG.info("Namenode has bad files. " + badFiles.size());
assertTrue("Namenode has " + badFiles.size() + " bad files. Expecting 1.",
badFiles.size() == 1);
util.cleanup(fs, "/srcdat10");
} finally {
if (cluster != null) { cluster.shutdown(); }
}
}
{code}
> Mismatch in displaying the "MissingBlock" count in fsck and in other metric
> reports
> -----------------------------------------------------------------------------------
>
> Key: HDFS-8533
> URL: https://issues.apache.org/jira/browse/HDFS-8533
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.7.0
> Reporter: J.Andreina
> Assignee: J.Andreina
> Priority: Critical
>
> Number of DN = 2
> Step 1: Write a file with replication factor - 3 .
> Step 2: Corrupt a replica in DN1
> Step 3: DN2 is down.
> Missing Block count in report is as follows
> Fsck report : *0*
> Jmx, "dfsadmin -report" , UI, logs : *1*
> In fsck , only block whose replicas are all missed and not been corrupted are
> counted
> {code}
> if (totalReplicasPerBlock == 0 && !isCorrupt) {
> // If the block is corrupted, it means all its available replicas are
> // corrupted. We don't mark it as missing given these available
> replicas
> // might still be accessible as the block might be incorrectly marked
> as
> // corrupted by client machines.
> {code}
> While in other reports even if all the replicas are corrupted , block is been
> considered as missed.
> Please provide your thoughts : can we make missing block count consistent
> across all the reports same as implemented for fsck?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]