sodonnel commented on PR #3384:
URL: https://github.com/apache/ozone/pull/3384#issuecomment-1125885960

   For now, I would like to leave the current LegacyReplicationManager class as 
it is and not focus on Balancer for EC. I feel that a lot of the logic for 
pending replications is overly complex, and if we just lift it out of 
LegacyReplicationManager and into a new class, it does not improve things - we 
have really just renamed the code.
   
   The legacy replication manager internally keeps a list of all pending 
replications and deletes. Each time a container is checked, it check this list 
and removes any replications that have been completed or expired. Then it gets 
the list of remaining pending operations to help decide if container is healthy 
or not.
   
   Rather than the ReplicationManager removing the completed and expired 
replications, we could have a standalone PendingContainerOps monitor, that 
works as follows:
   
   1. Replication Manager adds pending replications and deletes to it.
   2. Replication Manager queries it for anything pending for the current 
container and gets a list of PendingActions back.
   3. The PendingReplicationMonitor has its own internal thread that checks for 
expired replications and removes them.
   4. Completed replications and deletes are removed in ComtainerManagerImpl, 
which has updateContainerReplica and removeContainerReplica triggered via the 
container reports (ICR and FCR) from the datanodes as they are replicated.
   
   This way, the ReplicationManager does not need to worry about expiring 
replications or removing completed entries. We also get the ability to have a 
more up-to-date view of the system, as the ICR / FCRs will keep the pending 
table up-to-date in real time, rather than having to wait for the container to 
be re-checked inside replication manager.
   
   We can have a fairly simple "ContainerReplicaPendingOps" class that is 
basically standalone and inject it into ReplicationManager and 
ContainerManagerImpl. This would allow for removing some complexity from RM and 
let the expiry and completion be tested in an isolated way.
   
   I generally agree with your suggestions on the classes / functions we need 
to have around ReplicationManager. I have already started working on the health 
check interface in 
[HDDS-6697](https://issues.apache.org/jira/browse/HDDS-6697), but I got 
sidetracked into the EcContainerReplicaCounts class, which I realised I needed 
before going much further.
   
   I will have a go at creating the outline of what I described above and see 
how it looks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to