sodonnel commented on a change in pull request #2559:
URL: https://github.com/apache/ozone/pull/2559#discussion_r790943727
##########
File path:
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1116,19 +1116,37 @@ private void handleUnderReplicatedContainer(final
ContainerInfo container,
.stream()
.map(action -> action.datanode)
.collect(Collectors.toList());
- final List<DatanodeDetails> source = replicas.stream()
- .filter(r ->
- r.getState() == State.QUASI_CLOSED ||
- r.getState() == State.CLOSED)
- // Exclude stale and dead nodes. This is particularly important for
- // maintenance nodes, as the replicas will remain present in the
- // container manager, even when they go dead.
- .filter(r ->
- getNodeStatus(r.getDatanodeDetails()).isHealthy())
- .filter(r -> !deletionInFlight.contains(r.getDatanodeDetails()))
- .sorted((r1, r2) -> r2.getSequenceId().compareTo(r1.getSequenceId()))
- .map(ContainerReplica::getDatanodeDetails)
- .collect(Collectors.toList());
+ final List<DatanodeDetails> source;
+ final List<DatanodeDetails> excludeList;
+
+ //there may be a scenario that a replica is in the replica set
+ //when `source` is being generated, and then removed from replica set
+ //somehow(maybe by containerReportHandler#processMissingReplicas).
+ //when `excludeList` is being generated, the replica is not included.
+ //so this replica will not be definitely evicted from `selectedDatanodes`
+ //this will lead to a scenario that we send a replication command
+ //to a datanode to replicate a container replica from itself.
+ synchronized (replicas) {
Review comment:
This synchronization worries me. We are somehow supposed to know to
synchronize on this replica object from deep within the containerManager. There
is a chance that other code areas could run into problems if this map is not
consistent between when it is read and used.
I wonder if we would be better returning a new isolated map from
containerStateMap. Eg, it would become something like:
```
public Set<ContainerReplica> getContainerReplicas(
final ContainerID containerID) {
Preconditions.checkNotNull(containerID);
final Set<ContainerReplica> replicas = replicaMap.get(containerID);
return replicas == null ? null : Collections.unmodifiableSet(new
HashSet<>(replica));
}
```
Any callers to get the replicas have a set that does not change until they
request the replicas again?
If we don't want to change the inners of ContainerStateMap, we could do
something similar inside ReplicationManager and form a new set from the
returned set.
It would better to change the source of the map though, otherwise another
code area could run into a similar problem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]