Kihwal Lee created HDFS-11729:
---------------------------------
Summary: Improve NNStorageRetentionManager failure handling.
Key: HDFS-11729
URL: https://issues.apache.org/jira/browse/HDFS-11729
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Currently {{NNStorageRetentionManager}} will simply skip a storage directory if
a problem is detected. Since checkpoint saving does not go through the same
set of checks, this can lead to a space exhaustion seen in HDFS-11714.
Instead of ignoring errors, it should handle it properly. One potential
improvement is to catch the exception and report the storage directory failure
using {{NNStorage.reportErrorsOnDirectories()}}.
{{attemptRestoreRemovedStorage()}} will need extra checks. E.g. existence of a
VERSION file.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]