[
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600637#comment-13600637
]
Aaron T. Myers commented on HDFS-3277:
--------------------------------------
Patch largely looks good to me. A few small comments:
# I don't understand the need for the stashing away and potentially reloading
the FSNS secret manager state in the event we don't read to the end of the
fsimage file we're trying to load. Instead of doing all that, why not just
throw an IOE and have that get handled in FSImage#loadFSImage just like it
would be if we failed to load another part of the fsimage? That would end up
calling DelegationTokenSecretManager#reset, which seems correct to me.
# Seems like there's an extraneous new import in FSNamesystem.
# Recommend adding a class comment to LogAppender, and perhaps renaming that
class to something that makes it clear it's for interposing on/verifying log
output.
# Recommend refactoring the code in TestDFSUpgradeFromImage and TestStartup
which searches through the LogAppender lines into LogAppender itself, along the
lines of a "{{int countOccurrencesOf(String)}}".
# In TestStartup#corruptFSImageMD5 you might want to use the constant
Storage#STORAGE_DIR_CURRENT instead of hard-coding "current".
# In TestStartup#testImageChecksum, consider using
GenericTestUtils#assertExceptionContains instead of
"{{ioe.getMessage().contains(...)}}".
> fail over to loading a different FSImage if the first one we try to load is
> corrupt
> -----------------------------------------------------------------------------------
>
> Key: HDFS-3277
> URL: https://issues.apache.org/jira/browse/HDFS-3277
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Colin Patrick McCabe
> Assignee: Andrew Wang
> Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch,
> HDFS-3277.004.patch, HDFS-3277.005.patch
>
>
> Most users store multiple copies of the FSImage in order to prevent
> catastrophic data loss if a hard disk fails. However, our image loading code
> is currently not set up to start reading another FSImage if loading the first
> one does not succeed. We should add this capability.
> We should also be sure to remove the FSImage directory that failed from the
> list of FSImage directories to write to, in the way we normally do when a
> write (as opopsed to read) fails.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira