[ https://issues.apache.org/jira/browse/HDDS-11943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashish Kumar reassigned HDDS-11943: ----------------------------------- Assignee: Ashish Kumar > Fail storage volume after numerous reported IO errors > ----------------------------------------------------- > > Key: HDDS-11943 > URL: https://issues.apache.org/jira/browse/HDDS-11943 > Project: Apache Ozone > Issue Type: Sub-task > Reporter: Ethan Rose > Assignee: Ashish Kumar > Priority: Major > > Currently on-demand volume scanning is triggered for IO errors encountered > while the cluster is running, but the volume can only be failed by a > configurable number of volume scans failures. > The volume scanner syncs a file to the disk and reads it back. This itself > alone not catch some types of volume failures. For example, if older sectors > of a disk that have already been written to are failing for reads, the > container scanner will keep raising errors and marking containers unhealthy, > but the corresponding volume scans will always write their file to new > sectors that don't have errors. > To fix this, we can keep a counter of how many IO errors have been reported > from on-demand scan requests for a volume. If that number crosses a > configurable count, we can fail the volume even if volume scans are passing. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org