[GitHub] [ozone] sodonnel commented on a diff in pull request #3963: HDDS-6572. EC: ReplicationManager - add move manager for container move

GitBox Tue, 22 Nov 2022 07:26:48 -0800


sodonnel commented on code in PR #3963:
URL: https://github.com/apache/ozone/pull/3963#discussion_r1029478291



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -325,10 +329,22 @@ public synchronized void processAll() {
         containerManager.getContainers();
     ReplicationManagerReport report = new ReplicationManagerReport();
     ReplicationQueue newRepQueue = new ReplicationQueue();
+    Map<ContainerID, MoveDataNodePair> pendingMoves =
+        moveManaer.getPendingMove();

Review Comment:
   A fundamental design problem with the old RM was it placing many hours or 
even days worth or commands on the DNs without anyway to see what progress they 
are making.
   
   It seems the balancer has make this same mistake, in that it does not limit 
what is queued on a DN. Ideally, the balancer should schedule no more work than 
can be done in a heartbeat or two. If we have under utilized nodes, they are 
going to be the target for replicate commands, so the limit for the number 
queued should be quite small per heartbeat. The delete commands might be more 
scattered, but we should even limit them, but to a higher number.
   
   If we avoid deep queues on the DNs, then we also avoid the problem with a 
leader switch on SCM.
   
   At the moment ReplicationManager does not replicate its pending ops or 
commands to each SCM. The goal is to make the queue on the DN small, so that 
even if a leader switch did happen, there is not much more work to do, 
otherwise RM can fall into this same problem or scheduling multiple deletes.
   
   Better than that, we could also make the DN reject certain commands that 
don't have the current SCM Leader term. For example, any replicate, delete 
container, reconstruct container can be discarded if the SCM term has increased 
and the replication manager will just re-create commands on its next pass.
   
   There isn't really any reason that the Balancer could not do the same. Ie, 
schedule work only on DNs with the capacity to process it, and then drop all 
commands on a leader switch. It would make the balancer a lot more simple 
overall.
   
   Then, we could have the over-replication handling try to take space into 
account and remove the replicas from the most over-used nodes, making it do the 
correct thing more of the time. In the event of a failover, some 
over-replicated containers caused by the balancer might have the wrong replica 
removed, but if the free space is so finely balanced between the nodes that it 
removes the wrong one, it doesn't really matter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] sodonnel commented on a diff in pull request #3963: HDDS-6572. EC: ReplicationManager - add move manager for container move

Reply via email to