JacksonYao287 commented on code in PR #3963:
URL: https://github.com/apache/ozone/pull/3963#discussion_r1028919138
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -325,10 +329,22 @@ public synchronized void processAll() {
containerManager.getContainers();
ReplicationManagerReport report = new ReplicationManagerReport();
ReplicationQueue newRepQueue = new ReplicationQueue();
+ Map<ContainerID, MoveDataNodePair> pendingMoves =
+ moveManaer.getPendingMove();
Review Comment:
>Does the ReplicationManager really need to know about the pending moves?
in vast majority of cases, no need. but i take two corner cases into
account.
1 if a ratis container has only two replicas, r1(dn1) , r2(dn2) , now it is
under-replicated. if we schedule a move for r1 and everything goes well, there
will be a r3 in dn3 ultimately and this container is not under-replicated again.
if RM is not aware of this move, it will schedule to replication to a
randomly selected dn4 to handle this under-replicated state . after the
replication is done, the container is over-replicated(4 replicas), another
deletion will be scheduled. so if RM is aware of this move, the deletion can be
avoided.
2 if a container is over-replicated(r1, r2, r3, r4), and we want to move r1
to dn5, then movemanager will send a replication command to dn5. if RM is not
ware of this move and it find this over-replicated container, it may send a
deletion command to r1, which will fail the move.
>what if the over-replication processing was "space aware"
i have a [ jira ](https://issues.apache.org/jira/browse/HDDS-5278) to talk
about this.
beside what @lokesh said, another reason is scm leader switch. let`s say a
container has r1 , r2, r3, r4 on dn1, dn2, dn3, dn4. if rm1 in scm1(leader)
find this container is over-replicated and r1 has the least free space, it will
send a deletion to dn1. scm1 steps down and scm2 becomes the leader. if some
new data has been writen to dn3 and it gets the least free space now, rm2 in
scm2 will send a deletion command to dn3 to delete r3. now , two datanodes
receives the deletion command while only one replica need to be deleted.
this [jira ](https://issues.apache.org/jira/browse/HDDS-4589) make sure all
the scms will delete replicas in a certain sequence.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]