[ 
https://issues.apache.org/jira/browse/HDFS-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030244#comment-13030244
 ] 

Eli Collins edited comment on HDFS-1800 at 5/7/11 2:10 AM:
-----------------------------------------------------------

Approach #1 is fine with me. For #3 it's nice that the file is self-verifying 
and it's more consistent with the edits log checksumming (admins already need a 
special tool to verify the edits log). However it seems like the code will be 
simpler if the sum is stored separately (as it already is) and it is nice an 
admin can use {{md5sum}} on the raw image files.

Updated patch looks good.  Comments/questions:
* Why is the md5 file format "hash\tfile" instead of the format ("hash *file") 
produced by md5sum for binary files?
* MD5FilesUtil should live somewhere like o.a.h.h.util
* Per your question in NNStorage#attemptRestoreRemovedStorage I don't think we 
should automatically blow away the contents of a failed storage directory.  
* Does the version checking removed in NNStorage#getFields get moved somewhere 
else?
* In openEditLog why do we unconditionally write the current txid?
* doMerge needs more params in the javadoc
* Will the TODOs, 0xDEADBEEF sums etc be resolved in this jira or another one?
* What case does the TODO in formatOccurred refer to? The given storage 
directory and fsimage are out of sync?
* Is there a test that covers a directory with multiple checkpoints/checksum 
files?

      was (Author: eli):
    Approach #1 is fine with me. For #3 it's nice that the file is 
self-verifying and it's more consistent with the edits log checksumming (admins 
already need a special tool to verify the edits log). However it seems like the 
code will be simpler if the sum is stored separately (as it already is) and it 
is nice an admin can use {{md5sum}} on the raw image files.

Updated patch looks good.  Comments/questions:
* Why is the md5 file format "hash\tfile" instead of the format ("hash *file") 
produced by md5sum for binary files?
* MD5FilesUtil should live somewhere like o.a.h.h.util
* Per your question in NNStorage#attemptRestoreRemovedStorage I don't think we 
should automatically blow away the contents of a failed storage directory.  
* Does the version checking removed in NNStorage#getFields get moved somewhere 
else?
* In openEditLog why do we unconditionally write the current txid?
* Will getLoadedImageMd5 ever return null (causing us to skip the check) in 
case eg where 
* doMerge needs more params in the javadoc
* Will the TODOs, 0xDEADBEEF sums etc be resolved in this jira or another one?
* What case does the TODO in formatOccurred refer to? The given storage 
directory and fsimage are out of sync?
* Is there a test that covers a directory with multiple checkpoints/checksum 
files?
  
> Extend image checksumming to function with multiple fsimage files
> -----------------------------------------------------------------
>
>                 Key: HDFS-1800
>                 URL: https://issues.apache.org/jira/browse/HDFS-1800
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: hdfs-1800-prelim.txt, hdfs-1800.txt, hdfs-1800.txt
>
>
> HDFS-903 added the MD5 checksum of the fsimage to the VERSION file in each 
> image directory. This allows it to verify that the FSImage didn't get 
> corrupted or accidentally replaced on disk.
> With HDFS-1073, there may be multiple fsimage_N files in a storage directory 
> corresponding to different checkpoints. So having a single MD5 in the VERSION 
> file won't suffice. Instead we need to store an MD5 per image file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to