Stephen O'Donnell created HDDS-6975:
---------------------------------------

             Summary: EC: Define the value of Maintenance Redundancy for EC 
containers
                 Key: HDDS-6975
                 URL: https://issues.apache.org/jira/browse/HDDS-6975
             Project: Apache Ozone
          Issue Type: Sub-task
          Components: SCM
            Reporter: Stephen O'Donnell


For Ratis containers, we have a setting 
hdds.scm.replication.maintenance.replica.minimum, which defaults to 2. This 
indicates how many replicas must still be online when a node is allowed to go 
into maintenance.

For EC, we need to decide if we reuse this setting, and what it means.

For example, for Ratis containers, with 3 replicas and the default of 2, you 
should be able to take any single node offline without any replication.

With EC, if you had 3-2 containers, and must have a remaining redundancy of 2, 
then you must replicate if any node goes offline.

However for the other EC schemes, 6-3, 10-4, they are at least as good as Ratis 
with a default of 2.

A better setting for EC might be parityNum / 2, rounded down by integer 
division:

3-2 = 1
6-3 = 1
10-4 = 2 remaining redundancy.

Or perhaps parityNum - X, where X is the number of replicas allowed to be 
offline in a nEC group.

Its also worth noting that for EC, maintenance mode makes reads much more 
expensive. Potentially all reads will turn into online-reconstruction. For 
Ratis, it just reduces the available nodes to read from.

With that in mind, another argument for EC, is that all data containers are 
kept online for maintenance, with only parity + redundancy allowed to be 
offline. I feel that would be a more tricky feature and something we may 
consider in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to