[ https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836073#action_12836073 ]
Todd Lipcon commented on HDFS-955: ---------------------------------- bq. If EDITS_NEW is not present, but IMAGE_NEW is, this means that the NN failure occurred after IMAGE_NEW was successfully written This is not necessarily the case on a lot of filesystems. As I noted in HDFS-970, delayed allocation combined with the default journaling modes in many commonly deployed filesystems means that you cannot use the existance of one file to determine whether data has been flushed in another. That is to say, some filesystems will recover the metadata operations on the EDITS files even though the data operations on IMAGE_NEW are incomplete. The _only_ way we can know that IMAGE_NEW is really on disk across a variety of filesystems is to fsync it. Otherwise, when the filesystem is recovered, we could rollback to a state where the file is empty but EDITS_NEW has been removed. bq. During start up the NN decides on whether to discard or to keep IMAGE_NEW (and rename it to IMAGE) based on the existence of EDITS_NEW I agree this is what it does. But I don't think there is then any valid rolling order that tolerates arbitrary crashes. See my discussion above bq. The contents may be corrupted during failures, it is not safe to rely on reading the data from image or edits files It is safe if we fsync. metadata can also be corrupted (rolled back to indeterminate states) in failures. Especially with the broken way in which we currently do image replacement, I don't want to take chances here. This is best explained by the presentation linked from this post by Theodore T'so: http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don%E2%80%99t-fear-fsync > FSImage.saveFSImage can lose edits > ---------------------------------- > > Key: HDFS-955 > URL: https://issues.apache.org/jira/browse/HDFS-955 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 0.20.1, 0.21.0, 0.22.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Blocker > Attachments: hdfs-955-moretests.txt, hdfs-955-unittest.txt, > PurgeEditsBeforeImageSave.patch > > > This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage > function (implementing dfsadmin -saveNamespace) can corrupt the NN storage > such that all current edits are lost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.