[jira] Commented: (HDFS-955) FSImage.saveFSImage can lose edits

Todd Lipcon (JIRA) Mon, 08 Feb 2010 18:13:52 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831258#action_12831258
 ]


Todd Lipcon commented on HDFS-955:
----------------------------------



loadFSImage:
  - find a pair of EDITS and IMAGE that have the same checkpoint time and are 
from the latest checkpointTime
    (this ignores *_NEW)
  - recoverInterruptedCheckpoint:
    if there is an IMAGE_NEW:
      if there is EDITS_NEW:
        delete IMAGE_NEW (since we assume we can replay from IMAGE + EDITS + 
EDITS_NEW?
      else:
        replace IMAGE with IMAGE_NEW, delete IMAGE_NEW
I took some pseudocode notes on what's currently going on in the load/save code:

{noformat}
  - load IMAGE
  - load EDITS
  - load EDITS_NEW

  - if need to save:
    saveFSImage:
      save IMAGE_NEW
      truncate EDITS
      if EDITS_NEW exists:
        truncate EDITS_NEW
      rollFSImage:
        purgeEditLog:
          replace EDITS with EDITS_NEW
        renameCheckpoint:
          replace IMAGE with IMAGE_NEW
{noformat}

Next I'll look at a failure at each point and see if recovery works. Longer 
term we should also figure out how to get either use the FI test framework or 
some clever mockito spies to inject these failures for unit tests.

> FSImage.saveFSImage can lose edits
> ----------------------------------
>
>                 Key: HDFS-955
>                 URL: https://issues.apache.org/jira/browse/HDFS-955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-955) FSImage.saveFSImage can lose edits

Reply via email to