[ 
https://issues.apache.org/jira/browse/HDDS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDDS-7192:
--------------------------------------
    Status: Patch Available  (was: Open)

> EC: ReplicationManager - create handlers to perform various container checks
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-7192
>                 URL: https://issues.apache.org/jira/browse/HDDS-7192
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> One of the goals I had in mind when developing the new Replication Manager 
> was to make the code cleaner, easier to test and easier to follow. The Legacy 
> Replication Manager has a lot of logic in a single class, to handle all sorts 
> of container states and conditions and that makes it hard to test and 
> difficult to see what is even tested.
> If we define “business logic” as the rules and conditions which indicate if a 
> container is healthy or not, and also the rules and logic to fix any 
> conditions the container may have, such as bad replica states, over / under 
> replication etc.
> Then the Replication Manager “infrastructure” or “plumbing” is what stitches 
> the business logic together, provides access to queued containers, iterates 
> over the containers etc. This logic is relatively simple in general.
> We have also started using the ReplicationManager class as a kind of proxy 
> object, to 
> My goal is to try to separate all the business logic into separate logical 
> classes that can each be tested in isolation. Then the Replication Manager 
> class itself can focus on iterating over the container containers and 
> applying the business logic, dispatching commands and queuing containers for 
> remediation (under / over replication processing).
> Already I can see some areas where we are starting to copy the legacy 
> replication manager and putting some business logic items into the 
> Replication Manager processing flow, so I would like to stop and have a 
> discussion on the best way forward.
> Looking at the logic we have so far, the replication manager checks look like:
> * Open Container Healthy check - If the container is open and should be 
> closed for some reason. Currently in ReplicationManager class
> * Unhealthy Replica Handing - If the container is closed and some replicas 
> are not in the correct closed state. Currently in Replication Manager class.
> * Empty Container Handling - If the container is empty, remove its replicas. 
> Not yet implemented, but there is a PR adding this into the RM class - 
> https://github.com/apache/ozone/pull/3660
> * ECHealthCheck - under / over / mis replication (ECContainerHealthCheck)
> * RatisHealthCheck - under / over/ mis replication (not yet implemented, but 
> will be in RatisContainerHealthCheck)
> * The legacy Manager also has logic for CLOSING and QUASI_CLOSED containers 
> we need to get in somewhere.
> What I would like to propose is that we should make each of these above 
> checks into their own class. Some of the classes will be very small, and 
> others will be medium sized.
> We can then use a variation of the "Chain of Responsibility" pattern to bring 
> all the checks together into a chain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to