swamirishi commented on code in PR #4006:
URL: https://github.com/apache/ozone/pull/4006#discussion_r1040187251


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java:
##########
@@ -426,4 +451,67 @@ public boolean isValidNode(DatanodeDetails datanodeDetails,
     }
     return false;
   }
+
+  /**
+   * Given a set of replicas of a container which are
+   * neither over underreplicated nor overreplicated,
+   * return a set of replicas to copy to another node to fix misreplication.
+   * @param replicas
+   */
+  @Override
+  public Set<ContainerReplica> replicasToCopyToFixMisreplication(
+         Set<ContainerReplica> replicas) {
+    Map<Node, List<ContainerReplica>> placementGroupReplicaIdMap
+            = replicas.stream().collect(Collectors.groupingBy(replica ->
+            this.getPlacementGroup(replica.getDatanodeDetails())));
+
+    int totalNumberOfReplicas = replicas.size();
+    int requiredNumberOfPlacementGroups =
+            getRequiredRackCount(totalNumberOfReplicas);
+    int additionalNumberOfRacksRequired = Math.max(
+            requiredNumberOfPlacementGroups - 
placementGroupReplicaIdMap.size(),
+            0);
+    int replicasPerPlacementGroup =
+            getMaxReplicasPerRack(totalNumberOfReplicas);
+    Set<ContainerReplica> copyReplicaSet = Sets.newHashSet();
+
+    for (List<ContainerReplica> replicaList: placementGroupReplicaIdMap
+            .values()) {
+      if (replicaList.size() > replicasPerPlacementGroup) {
+        List<ContainerReplica> replicasToBeCopied = replicaList.stream()
+                .limit(replicaList.size() - replicasPerPlacementGroup)
+                .collect(Collectors.toList());
+        copyReplicaSet.addAll(replicasToBeCopied);
+        replicaList.removeAll(replicasToBeCopied);
+      }
+    }
+    if (additionalNumberOfRacksRequired > copyReplicaSet.size()) {

Review Comment:
   This wouldn't work for the case say we have 5 replicas and 4 racks. Say 
current placement has:
   Rack 1: 2
   Rack 2: 2
   Rack 3: 1
   In this case one of the replica from either Rack 1 or Rack 2 has to be 
copied.
   Max Replicas per rack is 2 in this case (5/4) + (5%4=1) = 2
   So that is why I added another check for additionalNumberOfRacksRequired & I 
am doing the same algorithm, but now comparing the rack with most replicas with 
the rack with second highest number of replicas instead of directly removing 
the number replicas.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to