[jira] [Commented] (HDFS-17052) Erasure coding reconstruction failed when num of storageType rack NOT enough

ASF GitHub Bot (Jira) Mon, 26 Jun 2023 21:41:03 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737438#comment-17737438
 ]


ASF GitHub Bot commented on HDFS-17052:
---------------------------------------

zhangshuyan0 commented on PR #5759:
URL: https://github.com/apache/hadoop/pull/5759#issuecomment-1608786418

   @zhtttylz How about combining the above two solutions? May avoid the 
drawbacks of both options.
   ```
   private void chooseEvenlyFromRemainingRacks(Node writer,
         Set<Node> excludedNodes, long blocksize, int maxNodesPerRack,
         List<DatanodeStorageInfo> results, boolean avoidStaleNodes,
         EnumMap<StorageType, Integer> storageTypes, int totalReplicaExpected,
         NotEnoughReplicasException e) throws NotEnoughReplicasException {
       int numResultsOflastChoose = 0;
       NotEnoughReplicasException lastException = e;
       int bestEffortMaxNodesPerRack = maxNodesPerRack;
       while (results.size() != totalReplicaExpected &&
           bestEffortMaxNodesPerRack < totalReplicaExpected) {
         // Exclude the chosen nodes
         final Set<Node> newExcludeNodes = new HashSet<>();
         for (DatanodeStorageInfo resultStorage : results) {
           addToExcludedNodes(resultStorage.getDatanodeDescriptor(),
               newExcludeNodes);
         }
   
         LOG.trace("Chosen nodes: {}", results);
         LOG.trace("Excluded nodes: {}", excludedNodes);
         LOG.trace("New Excluded nodes: {}", newExcludeNodes);
         final int numOfReplicas = totalReplicaExpected - results.size();
         numResultsOflastChoose = results.size();
         try {
           chooseOnce(numOfReplicas, writer, newExcludeNodes, blocksize,
               ++bestEffortMaxNodesPerRack, results, avoidStaleNodes,
               storageTypes);
         } catch (NotEnoughReplicasException nere) {
           lastException = nere;
         } finally {
           excludedNodes.addAll(newExcludeNodes);
         }
         if (numResultsOflastChoose == results.size()) {
           Map<String, Integer> nodesPerRack = new HashMap<>();
           for (DatanodeStorageInfo dsInfo : results) {
             String rackName = 
dsInfo.getDatanodeDescriptor().getNetworkLocation();
             nodesPerRack.merge(rackName, 1, Integer::sum);
           }
           for (int numNodes : nodesPerRack.values()) {
             if (numNodes > bestEffortMaxNodesPerRack) {
               bestEffortMaxNodesPerRack = numNodes;
             }
           }
         }
       }
   
       if (numResultsOflastChoose != totalReplicaExpected) {
         LOG.debug("Best effort placement failed: expecting {} replicas, only "
             + "chose {}.", totalReplicaExpected, numResultsOflastChoose);
         throw lastException;
       }
     }
   ```




> Erasure coding reconstruction failed when num of storageType rack NOT enough
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-17052
>                 URL: https://issues.apache.org/jira/browse/HDFS-17052
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec
>    Affects Versions: 3.4.0
>            Reporter: Hualong Zhang
>            Assignee: Hualong Zhang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: failed reconstruction ec in same rack-1.png, write ec in 
> same rack.png
>
>
> When writing EC data, if the number of racks matching the storageType is 
> insufficient, more than one block are allowed to be written to the same rack
> !write ec in same rack.png|width=962,height=604!
>  
>  
>  
> However, during EC block recovery, it is not possible to recover on the same 
> rack, which deviates from the expected behavior.
> !failed reconstruction ec in same rack-1.png|width=946,height=413!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-17052) Erasure coding reconstruction failed when num of storageType rack NOT enough

Reply via email to