errose28 commented on code in PR #5255: URL: https://github.com/apache/ozone/pull/5255#discussion_r1322165405
########## hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManagerUtil.java: ########## @@ -188,4 +195,111 @@ public List<DatanodeDetails> getUsedNodes() { } } + /** + * This is intended to be call when a container is under replicated, but there + * are no spare nodes to create new replicas on, due to having too many + * unhealthy replicas or quasi-closed replicas which cannot be closed due to + * having a lagging sequence ID. The logic here will select a replica to + * delete, or return null if there are none which can be safely deleted. + * @param containerInfo The container to select a replica to delete from + * @param replicas The list of replicas for the container + * @param pendingOps The list of pending replica operations for the container + * @return A replica to delete, or null if there are none which can be safely + * deleted. + */ + public static ContainerReplica selectUnhealthyReplicaForDelete( + ContainerInfo containerInfo, Set<ContainerReplica> replicas, + List<ContainerReplicaOp> pendingOps, + Function<DatanodeDetails, NodeStatus> nodeStatusFn) { + + for (ContainerReplicaOp op : pendingOps) { + if (op.getOpType() == ContainerReplicaOp.PendingOpType.DELETE) { + // There is at least one pending delete which will free up a node. + // Therefore we do nothing until that delete completes or times out. + LOG.debug("Container {} has pending deletes which will free nodes", + pendingOps); + return null; + } + } + + if (replicas.size() <= 2) { + // We never remove replicas if it will leave us with less than 2 replicas + LOG.debug("There are only {} replicas for container {} so no more will " + + "be deleted", replicas.size(), containerInfo); + return null; + } + + boolean foundMatchingReplica = false; + for (ContainerReplica replica : replicas) { + if (compareState(containerInfo.getState(), replica.getState())) { + foundMatchingReplica = true; + break; + } + } + if (!foundMatchingReplica) { + LOG.debug("No matching replica found for container {} with replicas " + + "{}. No unhealthy replicas can be safely delete.", + containerInfo, replicas); + return null; + } + + List<ContainerReplica> deleteCandidates = new ArrayList<>(); + for (ContainerReplica r : replicas) { + NodeStatus nodeStatus = nodeStatusFn.apply(r.getDatanodeDetails()); + if (nodeStatus == null || !nodeStatus.isHealthy() + || !nodeStatus.isInService()) { + continue; + } + + if (containerInfo.getState() != HddsProtos.LifeCycleState.QUASI_CLOSED && + r.getState() == ContainerReplicaProto.State.QUASI_CLOSED) { + // a quasi_closed replica is a candidate only if its seq id is less + // than the container's + if (r.getSequenceId() < containerInfo.getSequenceId()) { + deleteCandidates.add(r); + } + } else if (r.getState() == ContainerReplicaProto.State.UNHEALTHY) { + deleteCandidates.add(r); + } + } + + deleteCandidates.sort( + Comparator.comparingLong(ContainerReplica::getSequenceId)); + if (containerInfo.getState() == HddsProtos.LifeCycleState.CLOSED) { + return deleteCandidates.size() > 0 ? deleteCandidates.get(0) : null; + } + + if (containerInfo.getState() == HddsProtos.LifeCycleState.QUASI_CLOSED) { + List<ContainerReplica> nonUniqueOrigins = + findNonUniqueDeleteCandidates(replicas, deleteCandidates); + return nonUniqueOrigins.size() > 0 ? nonUniqueOrigins.get(0) : null; Review Comment: This can still get stuck in a 4 node cluster. For example, the following would remain under replicated in a 4 node cluster, or block decommissioning the 4th node: ``` SCM state: Quasi-closed Replica states: quasi(origin 1) quasi(origin 1) unhealthy(origin 2) unhealthy(origin 3) ``` I think we can handle this in a follow-up jira, especially since we are proposing changing #5261 to use this util method as well. The one follow-up will fix it in both places. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org