JacksonYao287 commented on code in PR #3963:
URL: https://github.com/apache/ozone/pull/3963#discussion_r1028919138


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -325,10 +329,22 @@ public synchronized void processAll() {
         containerManager.getContainers();
     ReplicationManagerReport report = new ReplicationManagerReport();
     ReplicationQueue newRepQueue = new ReplicationQueue();
+    Map<ContainerID, MoveDataNodePair> pendingMoves =
+        moveManaer.getPendingMove();

Review Comment:
   >Does the ReplicationManager really need to know about the pending moves?
   
   in vast majority of cases, no need. but i take two corner cases into 
account. 
   1 if a ratis container has only two replicas, r1(dn1) , r2(dn2) , now it is 
under-replicated. if we schedule a move for r1 and everything goes well, there 
will be a r3 in dn3 ultimately and this container is not under-replicated again.
   if RM is not aware of this move, it will schedule to replication to a 
randomly selected dn4 to handle this under-replicated state . after the 
replication is done, the container is over-replicated(4 replicas), another 
deletion will be scheduled. so if RM is aware of this move, the deletion can be 
avoided.
   
   2 if a container is over-replicated(r1, r2, r3, r4), and we want to move r1 
to dn5, then movemanager will send a replication command to dn5. if RM is not 
ware of this move and it find this over-replicated container, it may send a 
deletion command to r1, which will fail the move. 
   
   >what if the over-replication processing was "space aware"
   
   i have a [ jira ](https://issues.apache.org/jira/browse/HDDS-5278) to talk 
about this.
   beside what @lokesh said, another reason is scm leader switch. let`s say a 
container has r1 , r2, r3, r4 on dn1, dn2, dn3, dn4. if rm1 in scm1(leader) 
find this container is over-replicated and r1 has the least free space, it will 
send a deletion to dn1. scm1 steps down and scm2 becomes the leader. if some 
new data has been writen to dn3 and it gets the least free space now, rm2 in 
scm2 will send a deletion command to dn3 to delete r3. now , two datanodes 
receives the deletion command while only one replica need to be deleted.
   
   this [jira ](https://issues.apache.org/jira/browse/HDDS-4589) make sure all 
the scms will delete replicas in a certain sequence. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to