[
https://issues.apache.org/jira/browse/HDFS-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831580#action_12831580
]
dhruba borthakur commented on HDFS-957:
---------------------------------------
Ok, I see your point.
Suppose the NN issues the write system call to write IMAGE_IN_PROGRESS to the
header initially. Then it completes writing all the data to the file and then
seeks back to the layout version and then writes the correct LAYOUT VERSION.
Then the NN tries to close the file. Meanwhile, all these writes were buffered
in the OS buffers. Now, the NN closes the file and the header sector that has
the correct LAYOUT VERSION is flushed to the disk whereas some other pages of
the file encountered an error while being flushed. This kind of errors could be
detected by HDFS-903, isn't it?
The other question is that the device that shall store the FSImage now needs to
be Seekable. Is this the case earlier too?
+1 for this patch.
> FSImage layout version should be only once file is complete
> -----------------------------------------------------------
>
> Key: HDFS-957
> URL: https://issues.apache.org/jira/browse/HDFS-957
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: name-node
> Affects Versions: 0.22.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hdfs-957.txt
>
>
> Right now, the FSImage save code writes the LAYOUT_VERSION at the head of the
> file, along with some other headers, and then dumps the directory into the
> file. Instead, it should write a special IMAGE_IN_PROGRESS entry for the
> layout version, dump all of the data, then seek back to the head of the file
> to write the proper LAYOUT_VERSION. This would make it very easy to detect
> the case where the FSImage save got interrupted.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.