[ 
https://issues.apache.org/jira/browse/HDFS-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831808#action_12831808
 ] 

Todd Lipcon commented on HDFS-957:
----------------------------------

bq. I don't understand what problem this solves...

Looking at HDFS-955, it's clear that the current recovery mechanisms are not 
entirely complete. I'm for adding this as another safety guard / sanity check. 
It costs essentially nothing and makes absolutely sure we never try to read an 
unfinished image.

In particular, with this patch we'd be able to fix the issue raised 
[here|https://issues.apache.org/jira/browse/HDFS-955?focusedCommentId=12831806&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12831806]
 fairly trivially. Rather than always assuming that an fsimage.ckpt file is 
incomplete, we could easily recover with no ambiguity.

bq. It is very important in upgrade that we do not write anything into VERSION 
file until IMAGE file is written

This doesn't propose to change that. We're simply changing the order in which 
we write the IMAGE file - it doesn't change any ordering with regard to the 
other metadata files.

> FSImage layout version should be only once file is complete
> -----------------------------------------------------------
>
>                 Key: HDFS-957
>                 URL: https://issues.apache.org/jira/browse/HDFS-957
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-957.txt
>
>
> Right now, the FSImage save code writes the LAYOUT_VERSION at the head of the 
> file, along with some other headers, and then dumps the directory into the 
> file. Instead, it should write a special IMAGE_IN_PROGRESS entry for the 
> layout version, dump all of the data, then seek back to the head of the file 
> to write the proper LAYOUT_VERSION. This would make it very easy to detect 
> the case where the FSImage save got interrupted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to