[jira] [Commented] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

Aaron T. Myers (JIRA) Tue, 12 Mar 2013 16:57:15 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600637#comment-13600637
 ]


Aaron T. Myers commented on HDFS-3277:
--------------------------------------

Patch largely looks good to me. A few small comments:

# I don't understand the need for the stashing away and potentially reloading 
the FSNS secret manager state in the event we don't read to the end of the 
fsimage file we're trying to load. Instead of doing all that, why not just 
throw an IOE and have that get handled in FSImage#loadFSImage just like it 
would be if we failed to load another part of the fsimage? That would end up 
calling DelegationTokenSecretManager#reset, which seems correct to me.
# Seems like there's an extraneous new import in FSNamesystem.
# Recommend adding a class comment to LogAppender, and perhaps renaming that 
class to something that makes it clear it's for interposing on/verifying log 
output.
# Recommend refactoring the code in TestDFSUpgradeFromImage and TestStartup 
which searches through the LogAppender lines into LogAppender itself, along the 
lines of a "{{int countOccurrencesOf(String)}}".
# In TestStartup#corruptFSImageMD5 you might want to use the constant 
Storage#STORAGE_DIR_CURRENT instead of hard-coding "current".
# In TestStartup#testImageChecksum, consider using 
GenericTestUtils#assertExceptionContains instead of 
"{{ioe.getMessage().contains(...)}}".
                
> fail over to loading a different FSImage if the first one we try to load is 
> corrupt
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-3277
>                 URL: https://issues.apache.org/jira/browse/HDFS-3277
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Andrew Wang
>         Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, 
> HDFS-3277.004.patch, HDFS-3277.005.patch
>
>
> Most users store multiple copies of the FSImage in order to prevent 
> catastrophic data loss if a hard disk fails.  However, our image loading code 
> is currently not set up to start reading another FSImage if loading the first 
> one does not succeed.  We should add this capability.
> We should also be sure to remove the FSImage directory that failed from the 
> list of FSImage directories to write to, in the way we normally do when a 
> write (as opopsed to read) fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

Reply via email to