Stephen O'Donnell created HDDS-7192:
---------------------------------------
Summary: EC: ReplicationManager - create handlers to perform
various container checks
Key: HDDS-7192
URL: https://issues.apache.org/jira/browse/HDDS-7192
Project: Apache Ozone
Issue Type: Sub-task
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell
One of the goals I had in mind when developing the new Replication Manager was
to make the code cleaner, easier to test and easier to follow. The Legacy
Replication Manager has a lot of logic in a single class, to handle all sorts
of container states and conditions and that makes it hard to test and difficult
to see what is even tested.
If we define “business logic” as the rules and conditions which indicate if a
container is healthy or not, and also the rules and logic to fix any conditions
the container may have, such as bad replica states, over / under replication
etc.
Then the Replication Manager “infrastructure” or “plumbing” is what stitches
the business logic together, provides access to queued containers, iterates
over the containers etc. This logic is relatively simple in general.
We have also started using the ReplicationManager class as a kind of proxy
object, to
My goal is to try to separate all the business logic into separate logical
classes that can each be tested in isolation. Then the Replication Manager
class itself can focus on iterating over the container containers and applying
the business logic, dispatching commands and queuing containers for remediation
(under / over replication processing).
Already I can see some areas where we are starting to copy the legacy
replication manager and putting some business logic items into the Replication
Manager processing flow, so I would like to stop and have a discussion on the
best way forward.
Looking at the logic we have so far, the replication manager checks look like:
* Open Container Healthy check - If the container is open and should be closed
for some reason. Currently in ReplicationManager class
* Unhealthy Replica Handing - If the container is closed and some replicas are
not in the correct closed state. Currently in Replication Manager class.
* Empty Container Handling - If the container is empty, remove its replicas.
Not yet implemented, but there is a PR adding this into the RM class -
https://github.com/apache/ozone/pull/3660
* ECHealthCheck - under / over / mis replication (ECContainerHealthCheck)
* RatisHealthCheck - under / over/ mis replication (not yet implemented, but
will be in RatisContainerHealthCheck)
* The legacy Manager also has logic for CLOSING and QUASI_CLOSED containers we
need to get in somewhere.
What I would like to propose is that we should make each of these above checks
into their own class. Some of the classes will be very small, and others will
be medium sized.
We can then use a variation of the "Chain of Responsibility" pattern to bring
all the checks together into a chain.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]