Wrong fsimage format while entering recovery mode
-------------------------------------------------

                 Key: HDFS-2749
                 URL: https://issues.apache.org/jira/browse/HDFS-2749
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 0.20.2
            Reporter: Denny Ye
            Priority: Critical


hadoop is into a recovery mode and save namespace to disk before the system 
starting service. however, there are many situation will cause hadoop enter 
recovery mode like missing VERSION file and ckpt file exists due to last 
failure of checkpoint.
in recovery mode, namespace is loaded from previous fsimage, and the default 
numFiles of namespace.rootDir is 1. the numFiles number is read from fsimage 
(readInt as version, readInt as namespaceId, readLong as numFiles).
the numFiles number is not updated in namespace when saving namespace.
save namespace just after load fsimage which actually write numFiles which is 
default value 1 to disk.
the next time to load the saved fsimage from disk when rebooting or 
secondarynamenode doing checkpoint, the system will crash (OOM) because this 
fsimage is incorrect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to