[
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832212#action_12832212
]
Todd Lipcon commented on HDFS-955:
----------------------------------
h2. Without concurrent checkpoint
Looking at this as a series of state transitions on the storage directory:
{noformat}
State 1: normal operation
Valid: IMAGE + EDITS
(there is nothing special happening)
State 2: createNewIfNotExists
Valid a: IMAGE + EDITS
Valid b: IMAGE + EDITS + EDITS_NEW (since EDITS_NEW is empty)
Current recovery: b
State 3: saving IMAGE_NEW,
Same validity as State 2
Current recovery: b
State 4: save IMAGE_NEW complete
Valid a: IMAGE + EDITS
Valid b: IMAGE + EDITS + EDITS_NEW (since EDITS_NEW is empty)
Valid c: IMAGE_NEW + EDITS_NEW (since EDITS_NEW is empty)
Current recovery: b
State 5: truncate EDITS and EDITS_NEW
(a) and (b) are no longer valid
Valid c: IMAGE_NEW + EDITS_NEW
Current recovery: *b (this is the error we're seeing)
State 6: rollFSImage -> purgeEditLog: moves EDITS_NEW to EDITS
Valid: IMAGE_NEW + EDITS
Current recovery: rename IMAGE_NEW to IMAGE (correct)
State 7: rollFSImage -> renameCheckpoint: moves IMAGE_NEW to IMAGE
Valid: IMAGE + EDITS
Current recovery: no recovery necessary (correct)
{noformat}
The problem here is in State 5. The question is how to detect that
we are in this state during recovery so we can do the right thing.
This is where HDFS-957 comes in. With 957, the recovery logic can easily
determine that IMAGE_NEW is correct, and choose the same recovery
mechanism as state 6.
h2. With ongoing checkpoint (logs start rolled)
{noformat}
State 1: normal operation
Valid: IMAGE + EDITS + EDITS_NEW
Recovery: IMAGE + EDITS + EDITS_NEW
State 2: createNewIfNotExists
no effect - NEW already exists
State 3: saving IMAGE_NEW,
Same validity as State 2
Current recovery: IMAGE + EDITS + EDITS_NEW
State 4: save IMAGE_NEW complete
Valid a: IMAGE + EDITS + EDITS_NEW
Valid b: IMAGE_NEW only
Current recovery: a
State 5: truncate EDITS and EDITS_NEW
Valid: IMAGE_NEW (any other recovery is incorrect)
Current recovery: IMAGE + EDITS + EDITS_NEW (incorrect, loses data)
State 6: rollFSImage -> purgeEditLog: moves EDITS_NEW to EDITS
Valid: IMAGE_NEW + EDITS
Current recovery: rename IMAGE_NEW to IMAGE (correct)
State 7: rollFSImage -> renameCheckpoint: moves IMAGE_NEW to IMAGE
Valid: IMAGE + EDITS
Current recovery: no recovery necessary (correct)
{noformat}
So the issue in both cases is essentially the same, and both can be solved
if we use HDFS-957.
I'll work on a patch for this.
On a side note, I think there's another race where a checkpoint upload from the
SNN can
overlap with this operation and really screw things up. That's a separate JIRA
though.
> FSImage.saveFSImage can lose edits
> ----------------------------------
>
> Key: HDFS-955
> URL: https://issues.apache.org/jira/browse/HDFS-955
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 0.20.1, 0.21.0, 0.22.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Blocker
> Attachments: hdfs-955-unittest.txt, PurgeEditsBeforeImageSave.patch
>
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage
> such that all current edits are lost.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.