Kihwal Lee created HDFS-11729:
---------------------------------

             Summary: Improve NNStorageRetentionManager failure handling.
                 Key: HDFS-11729
                 URL: https://issues.apache.org/jira/browse/HDFS-11729
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Kihwal Lee


Currently {{NNStorageRetentionManager}} will simply skip a storage directory if 
a problem is detected.  Since checkpoint saving does not go through the same 
set of checks, this can lead to a space exhaustion seen in HDFS-11714.

Instead of ignoring errors, it should handle it properly.  One potential 
improvement is to catch the exception and report the storage directory failure 
using {{NNStorage.reportErrorsOnDirectories()}}. 
{{attemptRestoreRemovedStorage()}} will need extra checks. E.g. existence of a 
VERSION file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to