sodonnel commented on code in PR #4694:
URL: https://github.com/apache/ozone/pull/4694#discussion_r1190244821
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ECUnderReplicationHandler.java:
##########
@@ -124,19 +124,15 @@ public int processAndSendCommands(
return 0;
}
- // don't place reconstructed replicas on exclude nodes, since they already
- // have replicas
- List<DatanodeDetails> excludedNodes = replicas.stream()
- .map(ContainerReplica::getDatanodeDetails)
- .collect(Collectors.toList());
- // DNs that are already waiting to receive replicas cannot be targets
- excludedNodes.addAll(
- pendingOps.stream()
- .filter(containerReplicaOp -> containerReplicaOp.getOpType() ==
- ContainerReplicaOp.PendingOpType.ADD)
- .map(ContainerReplicaOp::getTarget)
- .collect(Collectors.toList()));
+ ReplicationManagerUtil.ExcludedAndUsedNodes excludedAndUsedNodes =
+ ReplicationManagerUtil.getExcludedAndUsedNodes(
+ new ArrayList<>(replicas), Collections.emptySet(), pendingOps,
+ replicationManager);
+ List<DatanodeDetails> excludedNodes
+ = excludedAndUsedNodes.getExcludedNodes();
excludedNodes.addAll(replicationManager.getExcludedNodes());
Review Comment:
The reason we only add `replicationManager.getExcludedNodes()` to the
excludeList in the EC handler is because of the way the command targets work.
For replication commands, we have the sources, and then select the least
loaded one to act as the source for the push replication. If they are all
overloaded we give an error, and we don't consider load on the target right
now. Hence we don't try to exclude any overloaded nodes as targets, as all our
load limiting is based on the number of queued commands. For replication we are
push, so we don't have a limit on the target.
For reconstruction, we have to select target nodes, and one of those targets
must be the place the command is sent to. Therefore we use the exclude list to
avoid selecting already overloaded command targets. However, if there is
decommission or maintenance too, the same exclude list applies to all of the EC
handler, so decommission and maintenance replicate commands would have their
targets restricted by this exclude list too.
It is a valid point - should we avoid using "overloaded with too many
commands" as a target for replication? It would be simple to do it, as we could
just move that one line of code into that new method. It would also make it
consistent with how the EC decommission and maintenance behave. Need to think
about it a bit more.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]