hanishakoneru opened a new pull request #3258: URL: https://github.com/apache/ozone/pull/3258
## What changes were proposed in this pull request? Currently, containers are marked UNHEALTHY by Container Scrubber for one of the following reasons: If an operation fails on an open/ closing container, it is marked unhealthy so that subsequent write transactions also fail. If Container Scrubber is enabled and ContainerMetadataScanner detects an error during KeyValueContainerCheck#fastCheck(). Metadata path or Chunks path is not accessible as a directory Container checksum verification fails On-disk Container Yaml data does not match in-memory container data (ContainerType, ContainerID, Container DBType, Metadata Path) If Container Scrubber is enabled and ContainerDataScanner (runs only on closed and quasi-closed containers) detects any block with missing or corrupted chunks file. If a container in “open” state in SCM is marked unhealthy (in the container report), SCM asks the DNs to close the container. But for a “closing” container with an “unhealthy” replica, SCM leaves the container replica as is. If ReplicationManager does not find a healthy replica for a container, it does not replicate that container. So if there is only 1 replica of a container and it is unhealthy, SCM will never replicate it and there is potential for data loss if that single replica is lost for any reason (for example: disk failure). If there is a Quasi-Closed replica and an Unhealthy container, SCM will delete the unhealthy container. In this scenario, SCM should not delete the unhealthy container if it can recovered as it is possible that the unhealthy container is ahead of the quasi-closed container. SCM should be more conservative with deleting unhealthy containers as they could possibly be recovered. This Jira proposes to let SCM replicate an unhealthy container if there is no other replica. Also, if there is only a quasi-closed replica and an unhealthy replica, SCM should not delete the unhealthy replica. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-6447 ## How was this patch tested? (Please explain how this patch was tested. Ex: unit tests, manual tests) (If this patch involves UI changes, please attach a screen-shot; otherwise, remove this) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
