devmadhuu commented on PR #5651:
URL: https://github.com/apache/ozone/pull/5651#issuecomment-1857240460

   > @devmadhuu Thanks for the comments.
   > 
   > > isMissing() still considers total replicas irrespective of 
healthy/unhealthy. Shouldn't we consider only HealthyReplicas here ?
   > 
   > I changed that because I think it's a discrepancy. Missing container for 
the SCM means an empty replica list, regardless of healthy/unhealthy. If a 
container has only 1 replica and that replica is unhealthy, it won't be 
considered missing for SCM but Recon would be reporting it as such. I got that 
from these places in the code
   > 
   >     * `ReplicationManager` marks a container missing 
[here](https://github.com/apache/ozone/blob/master/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/health/ClosingContainerHandler.java#L74-L78),
 the container replica list has ALL known replicas.
   > 
   >     * `LegacyReplicationManager` marks a container missing 
[here](https://github.com/apache/ozone/blob/master/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java#L535-L539)
 and 
[here](https://github.com/apache/ozone/blob/master/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java#L1834-L1837).
   > 
   > 
   > Please correct me if I misunderstood and the checked list contain only 
healthy containers.
   > 
   > > And we had this JIRA 
[HDDS-9633](https://issues.apache.org/jira/browse/HDDS-9633) to sync Recon 
container stats with RM container report.
   > 
   > I wasn't aware of that. Thanks.
   > 
   > > Is this PR covering all ?
   > 
   > This is covering `over-replication`, `under-replication` and `missing`. I 
think we can get rid of `replicaDelta` in Recon entirely and reuse SCM code for 
all calculations. I didn't want to move ahead with it because the scope will 
grow out of hand. We will have to modify all the Recon tests for containers.
   > 
   > I also haven't changed anything about `mis-replication`. I have a PR open 
for an issue related to that.
   > 
   > > Can you add test matching admin CLI RM container report numbers matching 
for various states with Recon showing container stats or matching.
   > 
   > Yes, let me look into an integration test or a robot test. I don't know 
which is easier at the moment. We should be able to use `curl` to get the data 
from Recon UI.
   
   @xBis7 thanks for continue working on this, As I understand Recon also was 
considering the container as missing when all 3 replicas are zero, so both SCM 
and Recon were treating missing container as same. You check the original code 
of Recon ContainerHealthStatus#isMissing method where Recon was using 
numReplicas == 0 match..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to