[
https://issues.apache.org/jira/browse/HDFS-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030244#comment-13030244
]
Eli Collins edited comment on HDFS-1800 at 5/7/11 1:15 AM:
-----------------------------------------------------------
Approach #1 is fine with me. For #3 it's nice that the file is self-verifying
and it's more consistent with the edits log checksumming (admins already need a
special tool to verify the edits log). However it seems like the code will be
simpler if the sum is stored separately (as it already is) and it is nice an
admin can use {{md5sum}} on the raw image files.
Updated patch looks good. Comments/questions:
* Why is the md5 file format "hash\tfile" instead of the format ("hash *file")
produced by md5sum for binary files?
* MD5FilesUtil should live somewhere like o.a.h.h.util
* Per your question in NNStorage#attemptRestoreRemovedStorage I don't think we
should automatically blow away the contents of a failed storage directory.
* Does the version checking removed in NNStorage#getFields get moved somewhere
else?
* In openEditLog why do we unconditionally write the current txid?
* Will getLoadedImageMd5 ever return null (causing us to skip the check) in
case eg where
* doMerge needs more params in the javadoc
* Will the TODOs, 0xDEADBEEF sums etc be resolved in this jira or another one?
* What case does the TODO in formatOccurred refer to? The given storage
directory and fsimage are out of sync?
* Is there a test that covers a directory with multiple checkpoints/checksum
files?
was (Author: eli):
Approach #1 is fine with me. For #3 it's nice that the file is
self-verifying and it's more consistent with the edits log checksumming (admins
already need a special tool to verify the edits log). However it seems like the
code will be simpler if the sum is stored separately (as it already is) and it
is nice an admin can use {{md5sum}} on the raw image files.
Updated patch looks good. Comments/questions:
* Why is the md5 file format "hash\tfile" instead of the format ("hash *file")
produced by md5sum for binary files?
* MD5FilesUtil should live somewhere like o.a.h.h.util
* Per your question in NNStorage#attemptRestoreRemovedStorage I don't think we
should automatically blow away the contents of a failed storage directory.
* Does the version checking removed in NNStorage#getFields get moved somewhere
else?
* In openEditLog why do we unconditionally write the current txid? Would that
* Will getLoadedImageMd5 ever return null (causing us to skip the check) in
case eg where
* doMerge needs more params in the javadoc
* Will the TODOs, 0xDEADBEEF sums etc be resolved in this jira or another one?
* What case does the TODO in formatOccurred refer to? The given storage
directory and fsimage are out of sync?
* Is there a test that covers a directory with multiple checkpoints/checksum
files?
> Extend image checksumming to function with multiple fsimage files
> -----------------------------------------------------------------
>
> Key: HDFS-1800
> URL: https://issues.apache.org/jira/browse/HDFS-1800
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: name-node
> Affects Versions: Edit log branch (HDFS-1073)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-1800-prelim.txt, hdfs-1800.txt, hdfs-1800.txt
>
>
> HDFS-903 added the MD5 checksum of the fsimage to the VERSION file in each
> image directory. This allows it to verify that the FSImage didn't get
> corrupted or accidentally replaced on disk.
> With HDFS-1073, there may be multiple fsimage_N files in a storage directory
> corresponding to different checkpoints. So having a single MD5 in the VERSION
> file won't suffice. Instead we need to store an MD5 per image file.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira