[jira] [Commented] (HDFS-17401) EC: Excess internal block may not be able to be deleted correctly when it's stored in fallback storage

ASF GitHub Bot (Jira) Fri, 19 Apr 2024 21:46:56 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839188#comment-17839188
 ]


ASF GitHub Bot commented on HDFS-17401:
---------------------------------------

haiyang1987 commented on code in PR #6597:
URL: https://github.com/apache/hadoop/pull/6597#discussion_r1522764907


##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestReconstructStripedBlocks.java:
##########
@@ -575,5 +576,80 @@ public void testReconstructionWithStorageTypeNotEnough() 
throws Exception {
       cluster.shutdown();
     }
   }
+  @Test
+  public void testDeleteOverReplicatedStripedBlock() throws Exception {
+    final HdfsConfiguration conf = new HdfsConfiguration();
+    conf.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1);
+    conf.setBoolean(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_CONSIDERLOAD_KEY,
+            false);
+    StorageType[][] st = new StorageType[groupSize + 2][1];
+    for (int i = 0; i < st.length-1; i++){
+      st[i] = new StorageType[]{StorageType.SSD};
+    }
+    st[st.length -1] = new StorageType[]{StorageType.DISK};
+
+    cluster = new MiniDFSCluster.Builder(conf).numDataNodes(groupSize + 2)
+            .storagesPerDatanode(1)
+            .storageTypes(st)
+            .build();
+    cluster.waitActive();
+    DistributedFileSystem fs = cluster.getFileSystem();
+    fs.enableErasureCodingPolicy(
+            StripedFileTestUtil.getDefaultECPolicy().getName());
+    try {
+      fs.mkdirs(dirPath);
+      fs.setErasureCodingPolicy(dirPath,
+              StripedFileTestUtil.getDefaultECPolicy().getName());
+      fs.setStoragePolicy(dirPath, HdfsConstants.ALLSSD_STORAGE_POLICY_NAME);
+      DFSTestUtil.createFile(fs, filePath,
+              cellSize * dataBlocks * 2, (short) 1, 0L);
+      // Stop a dn
+      LocatedBlocks blks = 
fs.getClient().getLocatedBlocks(filePath.toString(), 0);
+      LocatedStripedBlock block = (LocatedStripedBlock) 
blks.getLastLocatedBlock();
+      DatanodeInfo dnToStop = block.getLocations()[0];
+
+      MiniDFSCluster.DataNodeProperties dnProp =
+              cluster.stopDataNode(dnToStop.getXferAddr());
+      cluster.stopDataNode(dnToStop.getXferAddr());
+      cluster.setDataNodeDead(dnToStop);
+
+      // Wait for reconstruction to happen
+      DFSTestUtil.waitForReplication(fs, filePath, groupSize, 15 * 1000);
+
+      DatanodeInfo dnToStop2 = block.getLocations()[1];
+      cluster.stopDataNode(dnToStop2.getXferAddr());
+      cluster.setDataNodeDead(dnToStop2);
+      DFSTestUtil.waitForReplication(fs, filePath, groupSize, 15 * 1000);
+
+      // Bring the dn back: 10 internal blocks now
+      cluster.restartDataNode(dnProp);
+      cluster.waitActive();
+      DFSTestUtil.verifyClientStats(conf, cluster);
+
+      // Currently namenode is able to track the missing block. And restart NN

Review Comment:
   here why need restart namenode?
   For Line[629-639] 
   ```
     for (DataNode dn : cluster.getDataNodes()) {
           DataNodeTestUtils.triggerBlockReport(dn);
         }
   
         BlockManager bm = cluster.getNamesystem().getBlockManager();
         GenericTestUtils.waitFor(()
                 -> bm.getPendingDeletionBlocksCount() == 0,
             10, 2000);
   
         for (DataNode dn : cluster.getDataNodes()) {
           DataNodeTestUtils.triggerHeartbeat(dn);
         }
   
         for (DataNode dn : cluster.getDataNodes()) {
           DataNodeTestUtils.triggerDeletionReport(dn);
       }
   ```
   Maybe update this logic to avoid using `Thread.sleep(3000)`, what do you 
think?





> EC: Excess internal block may not be able to be deleted correctly when it's 
> stored in fallback storage
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17401
>                 URL: https://issues.apache.org/jira/browse/HDFS-17401
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.3.6
>            Reporter: Ruinan Gu
>            Assignee: Ruinan Gu
>            Priority: Major
>              Labels: pull-request-available
>
> Excess internal block can't be deleted correctly when it's stored in fallback 
> storage.
> Simple case:
> EC-RS-6-3-1024k file is stored using ALL_SSD storage policy(SSD is default 
> storage type and DISK is fallback storage type), if the block group is as 
> follows
> [0(SSD), 0(SSD), 1(SSD), 2(SSD), 3(SSD), 4(SSD), 5(SSD), 6(SSD), 7(SSD), 
> 8(DISK)] 
> The are two index 0 internal block and one of them should be chosen to 
> delete.But the current implement chooses the index 0 internal blocks as 
> candidates but DISK as exess storage type.As a result, the exess storage 
> type(DISK) can not correspond to the exess internal blocks' storage type(SSD) 
> correctly, and the exess internal block can not be deleted correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17401) EC: Excess internal block may not be able to be deleted correctly when it's stored in fallback storage

Reply via email to