[
https://issues.apache.org/jira/browse/HDDS-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell updated HDDS-6744:
------------------------------------
Summary: EC: ReplicationManager - create ContainerReplicaPendingOps class
and integrate with ContainerManager (was: EC: ReplicationManager - create
PendingContainerOps class and integrate with ContainerManager)
> EC: ReplicationManager - create ContainerReplicaPendingOps class and
> integrate with ContainerManager
> ----------------------------------------------------------------------------------------------------
>
> Key: HDDS-6744
> URL: https://issues.apache.org/jira/browse/HDDS-6744
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: SCM
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
>
> The legacy replication manager internally keeps a list of all pending
> replications and deletes. Each time a container is checked, it check this
> list and removes any replications that have been completed or expired. Then
> it gets the list of remaining pending operations to help decide if container
> is healthy or not.
> Rather than the ReplicationManager removing the completed and expired
> replications, we could have a standalone PendingContainerOps monitor, that
> works as follows:
> 1. Replication Manager adds pending replications and deletes to it.
> 2. Replication Manager queries it for anything pending for the current
> container and gets a list of PendingActions back.
> 3. The PendingReplicationMonitor has its own internal thread that checks for
> expired replications and removes them.
> 4. Completed replications and deletes are removed in ComtainerManagerImpl,
> which has add and removeContainer triggered via the container reports (ICR
> and FCR) from the datanodes as they are replicated.
> This way, the ReplicationManager does not need to worry about expiring
> replications or removing completed entries. We also get the ability to have a
> more up-to-date view of the system, as the ICR / FCRs will keep the pending
> table up-to-date in real time, rather than having to wait for the container
> to be re-check inside replication manager.
> We can have a fairly simple "ContainerReplicaPendingOps" class that is
> basically standalone and inject it into ReplicationManager and
> ContainerManagerImpl. This would allow for removing some complexity from RM
> and let the expiry and completion be tested in an isolated way.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]