[jira] [Resolved] (HDDS-8535) ReplicationManager: Unhealthy containers could block EC recovery in small clusters

Stephen O'Donnell (Jira) Thu, 25 May 2023 09:20:03 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stephen O'Donnell resolved HDDS-8535.
-------------------------------------
    Fix Version/s: 1.4.0
       Resolution: Fixed

> ReplicationManager: Unhealthy containers could block EC recovery in small 
> clusters
> ----------------------------------------------------------------------------------
>
>                 Key: HDDS-8535
>                 URL: https://issues.apache.org/jira/browse/HDDS-8535
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Assignee: Siddhant Sangwan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>
> With EC containers, if there is a small cluster of say 6 nodes with EC-3-2, a 
> container will require 5 nodes. If 2 containers become unhealthy, 
> reconstruction will be required to recover the 2 containers, but there is 
> only 1 spare node.
> This means one will get recovered, and we will have 4 "good" containers and 2 
> "unhealthy" and the container will remain stuck like this because unhealthy 
> containers are only removed once the container is has no over or under 
> replication.
> A similar problem was resolved previously where an EC container with both 
> over and under replication can meet the same problem, where under replication 
> cannot proceed due to insufficient spare nodes. In that case, the solution 
> was to check for this case, and call the over-replication handler to clear up 
> the excess replicas. A similar solution is required here to remove some 
> unhealthy nodes to allow progress to be made.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (HDDS-8535) ReplicationManager: Unhealthy containers could block EC recovery in small clusters

Reply via email to