[ https://issues.apache.org/jira/browse/HDFS-17798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
kuper updated HDFS-17798: ------------------------- Labels: pull-request-available (was: ) > The problem that bad replicas in the mini cluster cannot be automatically > replicated > ------------------------------------------------------------------------------------ > > Key: HDFS-17798 > URL: https://issues.apache.org/jira/browse/HDFS-17798 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement > Affects Versions: 3.3.6 > Reporter: kuper > Assignee: kuper > Priority: Major > Labels: pull-request-available > > * In a 3-datanode cluster with a 3-replica block, if one replica on a node > becomes corrupted (and this corruption did not occur during the write > process), it will result in: > ** The corrupted replica cannot be removed from the damaged node. > ** Due to the missing replica, replication reconstruction tasks will > continuously attempt to replicate the corrupted replica. > ** However, during reconstruction, nodes already hosting a replica of this > block are excluded—meaning all 3 datanodes are excluded. > ** This prevents the selection of a suitable target node for replication, > eventually creating a {*}vicious cycle{*}. > *reproduction* > * Or execute in the 3datanode cluster in the following order > ** Find a normal block with three replicas and destroy the replica file of > one of its Datanodes > ** When waiting for the datanode disk scan cycle, it will be found that > there is a damaged copy and it cannot be rebuilt by other copies > * In the TestBlockManager add > testMiniClusterCannotReconstructionWhileReplicaAnomaly > > {code:java} > @Test(timeout = 60000) > public void testMiniClusterCannotReconstructionWhileReplicaAnomaly() > throws IOException, InterruptedException, TimeoutException { > Configuration conf = new HdfsConfiguration(); > conf.setInt("dfs.datanode.directoryscan.interval", > DN_DIRECTORYSCAN_INTERVAL); > conf.setInt("dfs.namenode.replication.interval", 1); > conf.setInt("dfs.heartbeat.interval", 1); > String src = "/test-reconstruction"; > Path file = new Path(src); > MiniDFSCluster cluster = new > MiniDFSCluster.Builder(conf).numDataNodes(3).build(); > try { > cluster.waitActive(); > FSNamesystem fsn = cluster.getNamesystem(); > BlockManager bm = fsn.getBlockManager(); > > FSDataOutputStream out = null; > FileSystem fs = cluster.getFileSystem(); > try { > out = fs.create(file); > for (int i = 0; i < 1024 * 1; i++) { > out.write(i); > } > out.hflush(); > } finally { > IOUtils.closeStream(out); > } > > FSDataInputStream in = null; > ExtendedBlock oldBlock = null; > try { > in = fs.open(file); > oldBlock = DFSTestUtil.getAllBlocks(in).get(0).getBlock(); > } finally { > IOUtils.closeStream(in); > } > DataNode dn = cluster.getDataNodes().get(0); > String blockPath = > dn.getFSDataset().getBlockLocalPathInfo(oldBlock).getBlockPath(); > String metaBlockPath = > dn.getFSDataset().getBlockLocalPathInfo(oldBlock).getMetaPath(); > Files.write(Paths.get(blockPath), Collections.emptyList()); > Files.write(Paths.get(metaBlockPath), Collections.emptyList()); > cluster.restartDataNode(0, true); > cluster.waitDatanodeConnectedToActive(dn, 60000); > while(!dn.isDatanodeFullyStarted()) { > Thread.sleep(1000); > } > Thread.sleep(DN_DIRECTORYSCAN_INTERVAL * 1000); > cluster.triggerBlockReports(); > BlockInfo bi = bm.getStoredBlock(oldBlock.getLocalBlock()); > assertTrue(bm.isNeededReconstruction(bi, > bm.countNodes(bi, cluster.getNamesystem().isInStartupSafeMode()))); > BlockReconstructionWork reconstructionWork = null; > fsn.readLock(); > try { > reconstructionWork = bm.scheduleReconstruction(bi, 3); > } finally { > fsn.readUnlock(); > } > assertNotNull(reconstructionWork); > assertEquals(reconstructionWork.getContainingNodes().size(), 3); > > } finally { > if (cluster != null) { > cluster.shutdown(); > } > } > } {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org