Stephen O'Donnell created HDDS-7192:
---------------------------------------

             Summary: EC: ReplicationManager - create handlers to perform 
various container checks
                 Key: HDDS-7192
                 URL: https://issues.apache.org/jira/browse/HDDS-7192
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


One of the goals I had in mind when developing the new Replication Manager was 
to make the code cleaner, easier to test and easier to follow. The Legacy 
Replication Manager has a lot of logic in a single class, to handle all sorts 
of container states and conditions and that makes it hard to test and difficult 
to see what is even tested.

If we define “business logic” as the rules and conditions which indicate if a 
container is healthy or not, and also the rules and logic to fix any conditions 
the container may have, such as bad replica states, over / under replication 
etc.

Then the Replication Manager “infrastructure” or “plumbing” is what stitches 
the business logic together, provides access to queued containers, iterates 
over the containers etc. This logic is relatively simple in general.

We have also started using the ReplicationManager class as a kind of proxy 
object, to 

My goal is to try to separate all the business logic into separate logical 
classes that can each be tested in isolation. Then the Replication Manager 
class itself can focus on iterating over the container containers and applying 
the business logic, dispatching commands and queuing containers for remediation 
(under / over replication processing).

Already I can see some areas where we are starting to copy the legacy 
replication manager and putting some business logic items into the Replication 
Manager processing flow, so I would like to stop and have a discussion on the 
best way forward.

Looking at the logic we have so far, the replication manager checks look like:

* Open Container Healthy check - If the container is open and should be closed 
for some reason. Currently in ReplicationManager class
* Unhealthy Replica Handing - If the container is closed and some replicas are 
not in the correct closed state. Currently in Replication Manager class.
* Empty Container Handling - If the container is empty, remove its replicas. 
Not yet implemented, but there is a PR adding this into the RM class - 
https://github.com/apache/ozone/pull/3660
* ECHealthCheck - under / over / mis replication (ECContainerHealthCheck)
* RatisHealthCheck - under / over/ mis replication (not yet implemented, but 
will be in RatisContainerHealthCheck)
* The legacy Manager also has logic for CLOSING and QUASI_CLOSED containers we 
need to get in somewhere.

What I would like to propose is that we should make each of these above checks 
into their own class. Some of the classes will be very small, and others will 
be medium sized.

We can then use a variation of the "Chain of Responsibility" pattern to bring 
all the checks together into a chain.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to