[ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831300#action_12831300
 ] 

Todd Lipcon commented on HDFS-955:
----------------------------------

I've verified the behavior from HDFS-909 on 0.20 (though I'm pretty certain it 
also exists on trunk).

To reproduce, I did a little manual "fault injection" - I added {code}if (new 
File("/tmp/savefsimage.die").exists()) System.exit(1);{code} after saving 
IMAGE_NEW in saveFSImage. I then did the following sequence:

- start NN
- hadoop fs -mkdir test1
- hadoop dfsadmin -safemode enter
- touch /tmp/savefsimage.die
- hadoop dfsadmin -saveNamespace
- (NN "crashes")

This leaves dfs.name.dir/current as:
{noformat}
-rw-r--r-- 1 todd todd   4 2010-02-08 21:24 edits
-rw-r--r-- 1 todd todd   4 2010-02-08 21:24 edits.new
-rw-r--r-- 1 todd todd  94 2010-02-08 21:24 fsimage
-rw-r--r-- 1 todd todd 323 2010-02-08 21:24 fsimage.ckpt
-rw-r--r-- 1 todd todd   8 2010-02-08 21:24 fstime
-rw-r--r-- 1 todd todd 100 2010-02-08 21:24 VERSION
{noformat}

(fsimage.ckpt has the proper image including my directory)

If I now remove the fault injection file and start the NN, it "recovers" to:
{noformat}
-rw-r--r-- 1 todd todd   4 2010-02-08 21:25 edits
-rw-r--r-- 1 todd todd  94 2010-02-08 21:25 fsimage
-rw-r--r-- 1 todd todd   8 2010-02-08 21:25 fstime
-rw-r--r-- 1 todd todd 100 2010-02-08 21:25 VERSION
{noformat}
(ie all edits since last successful checkpoint were lost)

> FSImage.saveFSImage can lose edits
> ----------------------------------
>
>                 Key: HDFS-955
>                 URL: https://issues.apache.org/jira/browse/HDFS-955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to