siddhantsangwan commented on code in PR #4207:
URL: https://github.com/apache/ozone/pull/4207#discussion_r1113903613


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java:
##########
@@ -1613,6 +1614,18 @@ private boolean isOpenContainerHealthy(
         .allMatch(r -> compareState(state, r.getState()));
   }
 
+  private void setHealthStateForClosing(Set<ContainerReplica> replicas,
+                                        ContainerInfo container,
+                                        ReplicationManagerReport report) {
+    if (replicas.size() == 0) {
+      report.incrementAndSample(HealthState.MISSING, container.containerID());
+      report.incrementAndSample(HealthState.UNDER_REPLICATED,
+              container.containerID());
+      report.incrementAndSample(HealthState.MIS_REPLICATED,
+              container.containerID());

Review Comment:
   Yeah, I see that we're also setting the container as mis replicated, under 
replicated and missing further down in that method for closed containers.
   
   Looks like over time we've diverged from the original intention of the 
replication manager report:
   
   ```
    * This class is used by ReplicationManager. Each time ReplicationManager 
runs,
    * it creates a new instance of this class and increments the various 
counters
    * to allow for creating a report on the various container states within the
    * system. There is a counter for each LifeCycleState (open, closing, closed
    * etc) and the sum of each of the lifecycle state counters should equal the
    * total number of containers in SCM. Ie, each container can only be in one 
of
    * the Lifecycle states at any time.
   ```
   It specifies that each container should only be in one state at a time. 
   
   I think we need to decide what will best help with debugging. For example, 
if a container is missing, it's naturally also mis replicated and under 
replicated. We can choose to count it only once as missing or we can count it 
in all three categories, but that needs to be done consistently everywhere.
   
   The new RM does not count a missing container as mis replicated, but it does 
count it as under replicated in `RatisReplicationCheckHandler`. This is because 
it considers mis replication only when there is no under/over replication.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to