xBis7 commented on code in PR #5651:
URL: https://github.com/apache/ozone/pull/5651#discussion_r1406876671
##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthStatus.java:
##########
@@ -48,8 +48,12 @@ public class ContainerHealthStatus {
int repFactor = container.getReplicationConfig().getRequiredNodes();
this.healthyReplicas = healthyReplicas
.stream()
- .filter(r -> !r.getState()
- .equals((ContainerReplicaProto.State.UNHEALTHY)))
+ // Filter unhealthy replicas and
+ // replicas belonging to out-of-service nodes.
+ .filter(r ->
+ (!r.getDatanodeDetails().isDecommissioned() &&
+ !r.getDatanodeDetails().isMaintenance() &&
Review Comment:
As far as I understand, a node doesn’t go offline until its replicas have
been copied to another node. While ENTERING_MAINTENANCE or DECOMMISSIONING
container replicas are added or removed as needed to maintain proper
replication. The container will be under-replicated until copies have been made
and the node successfully becomes offline.
Once that is done, the container is correctly replicated, has 3 healthy and
available replicas and 1 offline. SCM doesn’t report any under-replicated or
over-replicated containers but Recon
- for master, counts 1 over-replicated because it sees 4 replicas (no
distinction between online - offline).
- for this patch, 0 count.
When the offline datanode is stopped, SCM doesn’t count unhealthy containers
and
- for master, Recon no longer counts 1 over-replicated container.
- for this patch, no change in Recon.
##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthStatus.java:
##########
@@ -48,8 +48,12 @@ public class ContainerHealthStatus {
int repFactor = container.getReplicationConfig().getRequiredNodes();
this.healthyReplicas = healthyReplicas
.stream()
- .filter(r -> !r.getState()
- .equals((ContainerReplicaProto.State.UNHEALTHY)))
+ // Filter unhealthy replicas and
+ // replicas belonging to out-of-service nodes.
+ .filter(r ->
+ (!r.getDatanodeDetails().isDecommissioned() &&
+ !r.getDatanodeDetails().isMaintenance() &&
Review Comment:
As far as I understand, a node doesn’t go offline until its replicas have
been copied to another node. While ENTERING_MAINTENANCE or DECOMMISSIONING
container replicas are added or removed as needed to maintain proper
replication. The container will be under-replicated until copies have been made
and the node successfully becomes offline.
Once that is done, the container is correctly replicated, has 3 healthy and
available replicas and 1 offline. SCM doesn’t report any under-replicated or
over-replicated containers but Recon
- for master, counts 1 over-replicated because it sees 4 replicas (no
distinction between online - offline).
- for this patch, 0 count.
When the offline datanode is stopped, SCM doesn’t count unhealthy containers
and
- for master, Recon no longer counts 1 over-replicated container.
- for this patch, no change in Recon.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]