sodonnel commented on code in PR #4207:
URL: https://github.com/apache/ozone/pull/4207#discussion_r1086829942
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java:
##########
@@ -1612,6 +1613,20 @@ private boolean isOpenContainerHealthy(
.allMatch(r -> compareState(state, r.getState()));
}
+ private void setHealthStateForClosing(Set<ContainerReplica> replicas,
+ ContainerInfo container,
+ ReplicationManagerReport report) {
+ if (!replicas.stream().
+ anyMatch(r -> compareState(LifeCycleState.OPEN, r.getState()) ||
+ compareState(LifeCycleState.CLOSED, r.getState()))) {
+ report.incrementAndSample(HealthState.MISSING, container.containerID());
Review Comment:
The definition of MISSING is that the container is not able to be read
because no replicas exist on any online nodes. If you have 3 replicas and then
all 3 DNs go offline, then there will be zero replicas available. The replicas
in SCM will be reduced to zero and hence `replicas.size() == 0` will be true.
When a DN goes offline and is marked DEAD (after 10 minutes) its replicas are
removed from SCM automatically - it does not remember the last location. If the
node comes back online, then the node will sent a replica report which will add
the replica location back to SCM.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]