xBis7 commented on code in PR #5651:
URL: https://github.com/apache/ozone/pull/5651#discussion_r1406306274


##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthStatus.java:
##########
@@ -48,8 +48,12 @@ public class ContainerHealthStatus {
     int repFactor = container.getReplicationConfig().getRequiredNodes();
     this.healthyReplicas = healthyReplicas
         .stream()
-        .filter(r -> !r.getState()
-            .equals((ContainerReplicaProto.State.UNHEALTHY)))
+        // Filter unhealthy replicas and
+        // replicas belonging to out-of-service nodes.
+        .filter(r ->
+            (!r.getDatanodeDetails().isDecommissioned() &&
+             !r.getDatanodeDetails().isMaintenance() &&

Review Comment:
   > It is "OK" for a maintenance replica to be offline
   
   But the issue here was that Recon was still counting it as online. 
   
   > The definition of maintenance is that one or two replicas out of 3 can be 
offline and the container is still considered healthy, so I am not sure if it 
is correct to just assume a maintenance copy is offline.
   
   Here Recon was displaying the replicas in maintenance as unhealthy and in 
the over-replicated column.
   
   > The concern I have is that Recon shows different counts for 
under-replicated than the RM report and it can cause confusion to users.
   
   SCM container report doesn't count replicas that are in decommission or 
maintenance while Recon was counting them as unhealthy. That's the purpose of 
this patch, to make Recon consistent with SCM.
   
   @sodonnel Check the example in the jira description. For example, I have 5 
datanodes and 3 replicas. If 2 of the replica nodes go into decommission or 
maintenance, then 2 new replicas will be created in other nodes. In total, in 
the container info we will have 5 replicas but 2 of them will be offline and 
the SCM container report won't print any unhealthy replicas. Recon on the other 
hand, will also count the 2 offline replicas as online and will display that 
the container is over-replicated.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to