[ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836073#action_12836073
 ] 

Todd Lipcon commented on HDFS-955:
----------------------------------

bq. If EDITS_NEW is not present, but IMAGE_NEW is, this means that the NN 
failure occurred after IMAGE_NEW was successfully written

This is not necessarily the case on a lot of filesystems. As I noted in 
HDFS-970, delayed allocation combined with the default journaling modes in many 
commonly deployed filesystems means that you cannot use the existance of one 
file to determine whether data has been flushed in another. That is to say, 
some filesystems will recover the metadata operations on the EDITS files even 
though the data operations on IMAGE_NEW are incomplete.

The _only_ way we can know that IMAGE_NEW is really on disk across a variety of 
filesystems is to fsync it. Otherwise, when the filesystem is recovered, we 
could rollback to a state where the file is empty but EDITS_NEW has been 
removed.

bq. During start up the NN decides on whether to discard or to keep IMAGE_NEW 
(and rename it to IMAGE) based on the existence of EDITS_NEW

I agree this is what it does. But I don't think there is then any valid rolling 
order that tolerates arbitrary crashes. See my discussion above

bq. The contents may be corrupted during failures, it is not safe to rely on 
reading the data from image or edits files

It is safe if we fsync. metadata can also be corrupted (rolled back to 
indeterminate states) in failures. Especially with the broken way in which we 
currently do image replacement, I don't want to take chances here. This is best 
explained by the presentation linked from this post by Theodore T'so: 
http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don%E2%80%99t-fear-fsync

> FSImage.saveFSImage can lose edits
> ----------------------------------
>
>                 Key: HDFS-955
>                 URL: https://issues.apache.org/jira/browse/HDFS-955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>         Attachments: hdfs-955-moretests.txt, hdfs-955-unittest.txt, 
> PurgeEditsBeforeImageSave.patch
>
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to