[jira] [Commented] (HDFS-17052) Erasure coding reconstruction failed when num of storageType rack NOT enough

ASF GitHub Bot (Jira) Tue, 27 Jun 2023 20:44:05 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737951#comment-17737951
 ]


ASF GitHub Bot commented on HDFS-17052:
---------------------------------------

zhangshuyan0 commented on code in PR #5759:
URL: https://github.com/apache/hadoop/pull/5759#discussion_r1244628377


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRackFaultTolerant.java:
##########
@@ -192,11 +192,19 @@ private void chooseEvenlyFromRemainingRacks(Node writer,
       } finally {
         excludedNodes.addAll(newExcludeNodes);
       }
+      if (numResultsOflastChoose == results.size()) {
+        Map<String, Integer> nodesPerRack = new HashMap<>();
+        for (DatanodeStorageInfo dsInfo : results) {
+          String rackName = 
dsInfo.getDatanodeDescriptor().getNetworkLocation();
+          nodesPerRack.merge(rackName, 1, Integer::sum);
+        }
+        bestEffortMaxNodesPerRack = Collections.max(nodesPerRack.values());

Review Comment:
   Is it possible to introduce infinite loops here? If each rack already has 
one chosen node and `bestEffortMaxNodesPerRack` is 2, and no datanode can be 
chosen now, then `bestEffortMaxNodesPerRack`  will change to 1 after line 201, 
which may cause an infinite loop. So the calculation of the maximum value 
should consider the old value of `bestEffortMaxNodesPerRack` to insure increase.





> Erasure coding reconstruction failed when num of storageType rack NOT enough
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-17052
>                 URL: https://issues.apache.org/jira/browse/HDFS-17052
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec
>    Affects Versions: 3.4.0
>            Reporter: Hualong Zhang
>            Assignee: Hualong Zhang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: failed reconstruction ec in same rack-1.png, write ec in 
> same rack.png
>
>
> When writing EC data, if the number of racks matching the storageType is 
> insufficient, more than one block are allowed to be written to the same rack
> !write ec in same rack.png|width=962,height=604!
>  
>  
>  
> However, during EC block recovery, it is not possible to recover on the same 
> rack, which deviates from the expected behavior.
> !failed reconstruction ec in same rack-1.png|width=946,height=413!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-17052) Erasure coding reconstruction failed when num of storageType rack NOT enough

Reply via email to