[ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852660#action_12852660
 ] 

Konstantin Shvachko commented on HDFS-955:
------------------------------------------

> loadFSImage() - why do we need check for directory type, while comparing 
> latest checkpoint times?

IMAGE latest checkpoint can be larger than EDITS latest checkpoint only if the 
two directories are image-only and edits-only respectively. If this is one 
directory which is IMAGE_AND_EDITS then the condition should be treated as an 
error. This could be an assert in current code, but being overprotective seems 
to be appropriate here.

> saveCurrent(), moveCurrent(), moveLastCheckpoint() could all be static. 
> Better still, they should all be methods in StorageDirectory.

saveCurrent() cannot be static, it uses editLog member. All three methods are 
very much name-node specific, while StorageDirectory is intended to have 
methods common for all storage types data-node or name-node. So I decided to 
keep them in FSImage.

> TestSaveNamespace - we should add other tests

I believe these cases are covered in updated TestDFSStorageStateRecovery. The 
test emulates failures on different stages by creating corresponding  storage 
directory configurations and verifies that the nodes recover or fail correctly.

The rest is corrected / updated.
Suresh, thank you for the meticulous review.

> FSImage.saveFSImage can lose edits
> ----------------------------------
>
>                 Key: HDFS-955
>                 URL: https://issues.apache.org/jira/browse/HDFS-955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>         Attachments: FSStateTransition7.htm, hdfs-955-moretests.txt, 
> hdfs-955-unittest.txt, PurgeEditsBeforeImageSave.patch, 
> saveNamespace-0.20.patch, saveNamespace-0.21.patch, saveNamespace.patch, 
> saveNamespace.patch, saveNamespace.txt
>
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to