sodonnel opened a new pull request, #3723:
URL: https://github.com/apache/ozone/pull/3723

   ## What changes were proposed in this pull request?
   
   For Ratis, the number of replicas which must be available when a node goes 
into maintenance is a simple integer defaulting to 2 in 
hdds.scm.replication.maintenance.replica.minimum.
   
   This means that for a Ratis container, one out of the 3 nodes can be offline 
without any replication happening. This can be set to 1, letting two go offline 
or 3 ensuring full redundancy and hence replication when any node is taken 
offline.
   
   It could be argued that 1 would be a better default here. With the default 
placement of 2 replicas on one rack and 1 on another rack, that should allow 
for a full rack to be taken offline without replication.
   
   For EC, its a little more tricky. Aside from Ratis 1 containers, which are 
rarely used in practice, EC can tolerate 2 offline (for 3-2), 3 (for 6-3) or 4 
(for 10-4).
   
   If we use the same default of 2, that means replication will always be 
required for 3-2 containers. Also the "number of replicas online" doesn't make 
as much sense for EC, as each replica is not identical.
   
   EC is also slightly more tricky - when any of the data copies are offline, 
online reconctruction must be used to read the data, causing a performance 
penalty, but that cannot be avoided.
   
   If we take the Ratis default of 2 - when there are two replicas out of 3 
online, then we have a remaining redundancy of 1 - ie we can afford to lose one 
more copy and still read data.
   
   If we change the Ratis setting to 1, there is a remaining redundancy of 0, 
because the loss of another replica renders the data unreadable.
   
   For EC, if we default the setting to a "remaining redundancy" of 1, this 
would mean we can tolerate a loss of 1 more replicas and still read the data.
   
   This would allow for 3-2 to have 1 replica offline, 6-3 could have 2 and 
10-4 could have 3 without any replicaion. In all cases the data redundancy is 
the same as with Ratis having 2 containers offline.
   
   Additionally, its highly likely online recovery will be needed to read the 
data, eg if 1 container is offline in 10-4 there is a 10 in 14 (5 in 7) chance 
its a data container, so trying to keep more containers online for larger EC 
groups is probably not going to help performance much.
   
   In a large cluster, ideally EC containers will be spread across racks such 
that there is only 1 replia per rack, so taking a full rack offline would only 
reduce the redundancy by 1 meaning even 3-2 containers could tolerate a rack 
going into maintenance.
   
   In summary, I believe the simplest solution, is to have an EC setting 
hdds.scm.replication.maintenance.ec.remaining.redundancy = 1 which we use for 
maintenance of EC containers and is basically equivalent to the Ratis default 
of 2. It may make sense to call the new parameter 
hdds.scm.replication.maintenance.remaining.redundancy and use the same value 
for both Ratis and EC, deprecating the old value.
   
   For now in this change I have added 
"hdds.scm.replication.maintenance.remaining.redundancy" and noted in the 
comments / docs this is for EC only. We should consider how to deprecate the 
old parameter and bring the two together in another Jira. I am reluctant to 
call this one `hdds.scm.replication.ec.maintenance.remaining.redundancy` as 
then we will have to deprecate two parameters in the future.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6975
   
   ## How was this patch tested?
   
   Existing tests cover the maintenance counts in the new RM related classes. I 
also modified the decommission and maintenance tests to include some EC data, 
and hence fully test the decommission and maintenance flows with EC data in 
place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to