[
https://issues.apache.org/jira/browse/HDDS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-7192:
---------------------------------
Labels: pull-request-available (was: )
> EC: ReplicationManager - create handlers to perform various container checks
> ----------------------------------------------------------------------------
>
> Key: HDDS-7192
> URL: https://issues.apache.org/jira/browse/HDDS-7192
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Labels: pull-request-available
>
> One of the goals I had in mind when developing the new Replication Manager
> was to make the code cleaner, easier to test and easier to follow. The Legacy
> Replication Manager has a lot of logic in a single class, to handle all sorts
> of container states and conditions and that makes it hard to test and
> difficult to see what is even tested.
> If we define “business logic” as the rules and conditions which indicate if a
> container is healthy or not, and also the rules and logic to fix any
> conditions the container may have, such as bad replica states, over / under
> replication etc.
> Then the Replication Manager “infrastructure” or “plumbing” is what stitches
> the business logic together, provides access to queued containers, iterates
> over the containers etc. This logic is relatively simple in general.
> We have also started using the ReplicationManager class as a kind of proxy
> object, to
> My goal is to try to separate all the business logic into separate logical
> classes that can each be tested in isolation. Then the Replication Manager
> class itself can focus on iterating over the container containers and
> applying the business logic, dispatching commands and queuing containers for
> remediation (under / over replication processing).
> Already I can see some areas where we are starting to copy the legacy
> replication manager and putting some business logic items into the
> Replication Manager processing flow, so I would like to stop and have a
> discussion on the best way forward.
> Looking at the logic we have so far, the replication manager checks look like:
> * Open Container Healthy check - If the container is open and should be
> closed for some reason. Currently in ReplicationManager class
> * Unhealthy Replica Handing - If the container is closed and some replicas
> are not in the correct closed state. Currently in Replication Manager class.
> * Empty Container Handling - If the container is empty, remove its replicas.
> Not yet implemented, but there is a PR adding this into the RM class -
> https://github.com/apache/ozone/pull/3660
> * ECHealthCheck - under / over / mis replication (ECContainerHealthCheck)
> * RatisHealthCheck - under / over/ mis replication (not yet implemented, but
> will be in RatisContainerHealthCheck)
> * The legacy Manager also has logic for CLOSING and QUASI_CLOSED containers
> we need to get in somewhere.
> What I would like to propose is that we should make each of these above
> checks into their own class. Some of the classes will be very small, and
> others will be medium sized.
> We can then use a variation of the "Chain of Responsibility" pattern to bring
> all the checks together into a chain.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]