sodonnel commented on pull request #1700:
URL: https://github.com/apache/ozone/pull/1700#issuecomment-745207917
This is a clever solution to the problem, however I worry it may not work
well in practice. Sorting the ContainerReplica will use this:
```
@Override
public int compareTo(ContainerReplica that) {
Preconditions.checkNotNull(that);
return new CompareToBuilder()
.append(this.containerID, that.containerID)
.append(this.datanodeDetails, that.datanodeDetails)
.build();
}
```
The containerID is fixed for the container, so you are effectively sorting
by the datanode address. This means that in general, all containers from the
same pipeline will always have an over-replicated container removed from the
same node potentially.
Say we decommission a host, then recommission it. We will have a lot of
containers with 4 replicas. We sort the DN list each time, and there is a good
chance that all the replicas could be removed from the same host (the
decommissioned and recommission one, or one of the original hosts), rather than
removing the replicas randomly across the cluster. This may result in some
nodes having much more free space than others.
This suggestion is obviously a much bigger change, but I wonder if it would
be possible to have the DNs provide a list of pending_delete blocks in their
container report / heartbeat, and then we can use that in SCM?
Or, if the DNs detect a new master SCM or a restarted SCM (I am not
up-to-speed on the SCM HA area), then purge their pending delete list and wait
for new instructions from the new/restarted SCM?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]