[ 
https://issues.apache.org/jira/browse/HDDS-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDDS-6975:
--------------------------------------
    Status: Patch Available  (was: Open)

> EC: Define the value of Maintenance Redundancy for EC containers
> ----------------------------------------------------------------
>
>                 Key: HDDS-6975
>                 URL: https://issues.apache.org/jira/browse/HDDS-6975
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>
> For Ratis, the number of replicas which must be available when a node goes 
> into maintenance is a simple integer defaulting to 2 in 
> hdds.scm.replication.maintenance.replica.minimum. 
> This means that for a Ratis container, one out of the 3 nodes can be offline 
> without any replication happening. This can be set to 1, letting two go 
> offline or 3 ensuring full redundancy and hence replication when any node is 
> taken offline.
> It could be argued that 1 would be a better default here. With the default 
> placement of 2 replicas on one rack and 1 on another rack, that should allow 
> for a full rack to be taken offline without replication.
> For EC, its a little more tricky. Aside from Ratis 1 containers, which are 
> rarely used in practice, EC can tolerate 2 offline (for 3-2), 3 (for 6-3) or 
> 4 (for 10-4).
> If we use the same default of 2, that means replication will always be 
> required for 3-2 containers. Also the "number of replicas online" doesn't 
> make as much sense for EC, as each replica is not identical.
> EC is also slightly more tricky - when any of the data copies are offline, 
> online reconctruction must be used to read the data, causing a performance 
> penalty, but that cannot be avoided.
> If we take the Ratis default of 2 - when there are two replicas out of 3 
> online, then we have a remaining redundancy of 1 - ie we can afford to lose 
> one more copy and still read data.
> If we change the Ratis setting to 1, there is a remaining redundancy of 0, 
> because the loss of another replica renders the data unreadable.
> For EC, if we default the setting to a "remaining redundancy" of 1, this 
> would mean we can tolerate a loss of 1 more replicas and still read the data.
> This would allow for 3-2 to have 1 replica offline, 6-3 could have 2 and 10-4 
> could have 3 without any replicaion. In all cases the data redundancy is the 
> same as with Ratis having 2 containers offline.
> Additionally, its highly likely online recovery will be needed to read the 
> data, eg if 1 container is offline in 10-4 there is a 10 in 14 (5 in 7) 
> chance its a data container, so trying to keep more containers online for 
> larger EC groups is probably not going to help performance much.
> In a large cluster, ideally EC containers will be spread across racks such 
> that there is only 1 replia per rack, so taking a full rack offline would 
> only reduce the redundancy by 1 meaning even 3-2 containers could tolerate a 
> rack going into maintenance.
> In summary, I believe the simplest solution, is to have an EC setting 
> hdds.scm.replication.maintenance.ec.remaining.redundancy = 1 which we use for 
> maintenance of EC containers and is basically equivalent to the Ratis default 
> of 2. It may make sense to call the new parameter 
> hdds.scm.replication.maintenance.remaining.redundancy and use the same value 
> for both Ratis and EC, deprecating the old value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to