siddhantsangwan commented on code in PR #5562:
URL: https://github.com/apache/ozone/pull/5562#discussion_r1387588993


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java:
##########
@@ -1340,23 +1342,52 @@ private void handleAllReplicasUnhealthy(ContainerInfo 
container,
       ReplicationManagerReport report) {
 
     List<ContainerReplica> replicas = replicaSet.getReplicas();
-    int excessReplicas = replicas.size() -
-        container.getReplicationConfig().getRequiredNodes();
-    int missingReplicas = excessReplicas * -1;
 
-    if (missingReplicas > 0) {
+    RatisContainerReplicaCount unhealthyReplicaSet =
+        new RatisContainerReplicaCount(container,
+            new HashSet<>(replicaSet.getReplicas()),
+            getPendingOps(container.containerID()),
+            minHealthyForMaintenance,
+            true);
+
+    if (unhealthyReplicaSet.isUnderReplicated()) {
       handleUnderReplicatedAllUnhealthy(container, replicas,
-          placementStatus, missingReplicas, report);
-    } else if (excessReplicas > 0) {
+          placementStatus, unhealthyReplicaSet.additionalReplicaNeeded(),
+          report);
+    } else if (unhealthyReplicaSet.isOverReplicated()) {
       handleOverReplicatedAllUnhealthy(container, replicas,
-          excessReplicas, report);
+          unhealthyReplicaSet.getExcessRedundancy(true), report);
     } else {
       // We have the correct number of unhealthy replicas. See if any of them
       // can be closed.
       closeReplicasIfPossible(container, replicas);
     }
   }
 
+  /**
+   * Transform the Legacy inflight operation in the pendingOps format.
+   * @param containerID The contaiuner to get the pending ops for.
+   * @return A list of pendingOp, or an empty list if none exist.
+   */
+  private List<ContainerReplicaOp> getPendingOps(ContainerID containerID) {
+    List<ContainerReplicaOp> pendingOps = new ArrayList<>();
+    List<InflightAction> inflightActions = 
inflightReplication.get(containerID);
+    if (inflightActions != null) {
+      for (InflightAction a : inflightActions) {

Review Comment:
   Checking the `InflightMap` class, all accesses to the inflight replication 
and deletion maps are synchronized. I wonder if we can move this entire method 
to that class and follow the same pattern?
   
   Also, while the command deadlines don't matter in our current context, it's 
possible to figure them out using `InflightAction#getTime` and 
`rmConf.getEventTimeout()`.
   
   I'm thinking of something like this:
   ```
       List<ContainerReplicaOp> toContainerReplicaOp(ContainerID id,
           long eventTimeoutDuration) {
         final List<InflightAction> actions = get(id);
         if (actions == null) {
           return Collections.emptyList();
         }
   
         List<ContainerReplicaOp> opsList = new ArrayList<>();
         synchronized (actions) {
           for (InflightAction action : actions) {
             ContainerReplicaOp.PendingOpType pendingOpType = isReplication() ?
                 ContainerReplicaOp.PendingOpType.ADD :
                 ContainerReplicaOp.PendingOpType.DELETE;
             ContainerReplicaOp op = new ContainerReplicaOp(pendingOpType,
                 action.getDatanode(), 0, action.getTime() + 
eventTimeoutDuration);
             opsList.add(op);
           }
         }
   
         return opsList;
       }
   ```



##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestLegacyReplicationManager.java:
##########
@@ -1582,6 +1583,43 @@ public void 
testUnderReplicatedWithOnlyUnhealthyReplicas(int numReplicas)
       assertUnderReplicatedCount(1);
     }
 
+    @ParameterizedTest
+    @EnumSource(value = LifeCycleState.class,
+        names = {"CLOSED", "QUASI_CLOSED"})
+    public void testUnderReplicatedWithOnlyUnhealthyReplicasDecommission(
+        LifeCycleState state)
+        throws Exception {
+      final ContainerInfo container = createContainer(state);
+      for (int i = 0; i < 2; i++) {
+        addReplica(container, NodeStatus.inServiceHealthy(), UNHEALTHY);
+      }
+      addReplica(container, new NodeStatus(DECOMMISSIONING, HEALTHY),
+          UNHEALTHY);
+      assertReplicaScheduled(1);
+      assertUnderReplicatedCount(1);
+      // Run again, and there should be a pending add scheduled, so nothing
+      // else should get scheduled.
+      assertReplicaScheduled(0);
+    }
+
+    @ParameterizedTest
+    @EnumSource(value = LifeCycleState.class,
+        names = {"CLOSED", "QUASI_CLOSED"})
+    public void testOverReplicatedWithOnlyUnhealthyReplicas()

Review Comment:
   Forgot to add LifeCycleState as a param here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to