[jira] [Commented] (HDFS-17054) Erasure coding: optimize checkReplicaOnStorage method to avoid regarding all replicas on one datanode as corrupt repeatly.

ASF GitHub Bot (Jira) Tue, 20 Jun 2023 03:01:30 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735220#comment-17735220
 ]


ASF GitHub Bot commented on HDFS-17054:
---------------------------------------

zhangshuyan0 commented on code in PR #5760:
URL: https://github.com/apache/hadoop/pull/5760#discussion_r1235035757


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:
##########
@@ -4544,25 +4546,32 @@ public NumberReplicas countNodes(BlockInfo b) {
   NumberReplicas countNodes(BlockInfo b, boolean inStartupSafeMode) {
     NumberReplicas numberReplicas = new NumberReplicas();
     Collection<DatanodeDescriptor> nodesCorrupt = corruptReplicas.getNodes(b);
+    HashSet<DatanodeDescriptor> haveComputedAsCorrupted = null;
     if (b.isStriped()) {
+      haveComputedAsCorrupted = new HashSet<>();
       countReplicasForStripedBlock(numberReplicas, (BlockInfoStriped) b,
-          nodesCorrupt, inStartupSafeMode);
+          nodesCorrupt, inStartupSafeMode, haveComputedAsCorrupted);
     } else {
       for (DatanodeStorageInfo storage : blocksMap.getStorages(b)) {
         checkReplicaOnStorage(numberReplicas, b, storage, nodesCorrupt,
-            inStartupSafeMode);
+            inStartupSafeMode, haveComputedAsCorrupted);
       }
     }
     return numberReplicas;
   }
 
   private StoredReplicaState checkReplicaOnStorage(NumberReplicas counters,
       BlockInfo b, DatanodeStorageInfo storage,
-      Collection<DatanodeDescriptor> nodesCorrupt, boolean inStartupSafeMode) {
+      Collection<DatanodeDescriptor> nodesCorrupt, boolean inStartupSafeMode,
+      HashSet<DatanodeDescriptor> haveComputedAsCorrupted) {
     final StoredReplicaState s;
     if (storage.getState() == State.NORMAL) {
       final DatanodeDescriptor node = storage.getDatanodeDescriptor();
-      if (nodesCorrupt != null && nodesCorrupt.contains(node)) {
+      if (nodesCorrupt != null && nodesCorrupt.contains(node) &&
+          (haveComputedAsCorrupted == null || 
!haveComputedAsCorrupted.contains(node))) {
+        if (haveComputedAsCorrupted != null) {
+          haveComputedAsCorrupted.add(node);

Review Comment:
   If I understand your code correctly, if the same block group has two 
internal blocks on the same datanode, then you will only calculate one. IMO, 
the current implementation of `CorruptReplicasMap` does not record which 
specific internal block on the datanode was corrupt, how could you confirm that 
there is only one internal block corrupt?





> Erasure coding: optimize checkReplicaOnStorage method to avoid regarding all 
> replicas on one datanode as corrupt repeatly.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17054
>                 URL: https://issues.apache.org/jira/browse/HDFS-17054
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: farmmamba
>            Priority: Critical
>              Labels: pull-request-available
>
> Currently, when we execute countNodes method for a striped block, if a 
> datanode has two ec blocks with different block id (because of some special 
> cases and I will find  and fix it  laterly.). One block is LIVE, the other is 
> CORRUPT. Current logic will compute two CORRUPT replicas because  
> corruptReplicas contains the datanode. We should fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-17054) Erasure coding: optimize checkReplicaOnStorage method to avoid regarding all replicas on one datanode as corrupt repeatly.

Reply via email to