Kihwal Lee created HDFS-11729: --------------------------------- Summary: Improve NNStorageRetentionManager failure handling. Key: HDFS-11729 URL: https://issues.apache.org/jira/browse/HDFS-11729 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee
Currently {{NNStorageRetentionManager}} will simply skip a storage directory if a problem is detected. Since checkpoint saving does not go through the same set of checks, this can lead to a space exhaustion seen in HDFS-11714. Instead of ignoring errors, it should handle it properly. One potential improvement is to catch the exception and report the storage directory failure using {{NNStorage.reportErrorsOnDirectories()}}. {{attemptRestoreRemovedStorage()}} will need extra checks. E.g. existence of a VERSION file. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org