sodonnel commented on PR #3920: URL: https://github.com/apache/ozone/pull/3920#issuecomment-1317282315
Looking at the suggestion for CLOSED container: > If all replicas are unhealthy, they should be replicated like healthy containers. This is difficult / impossible for EC. For EC we need to reconstruct the container by reading the entire contents from a quorum of other containers and generating the missing data. If all containers are legitimately unhealthy, with some sort of read problem, we are going to get errors reading some of the blocks and be unable to reconstruct the data. It would likely be possible to do a partial reconstruction, but that comes with significant complexity too, as some block might be missing from one of the containers in the group, which would trip up the clients trying to read it. For now, what we have done with EC is say that an UNHEALTHY replica is much like a missing one. If we have one or more, we treat the container as under replicated and try to fix it. If we also exclude UNHEALTHY from the over-replication handling, and treat them as if they are already gone. That way, over replication will not remove them. Only if the container is neither over or under replicated will we remove the unhealthy replicas. The problem we are left with, is that if too many are unhealthy, we cannot do a reconstruction and hence cannot fix the problem. Then we will not replicate them either, as we cannot really do that, and it is possible to lose some of the unhealthy replicas over time due to disk failures etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
