[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246950#comment-13246950
]
Tsz Wo (Nicholas), SZE commented on HDFS-3004:
----------------------------------------------
Sorry for coming late. Some comments:
- Since the recover mode may cause data lost, we should prompt and warn the
user in the very beginning.
- What happen if "-autoChooseDefault" is run with other options or standalone?
- It is hard to understand the option "-autoChooseDefault" from its name.
Could it be changed to something related to "recover", say
"-recoverAutoChooseDefault"?
- Why remove JournalStream?
- Please change RequestStop to RequestStopException. It is better to add an
error message to it. Also, please add javadoc to describe what does it mean.
- Could you should some sample screen shots on the recover mode?
- Please remove createRecoveryContext() from HdfsServerConstants. It does not
sound like a constant.
- Why askOperator(..) belongs to FSEditLogLoader but not RecoveryContext?
- Could you rename RecoveryContext to something related to image/edit, say
ImageEditRecovery.
> Implement Recovery Mode
> -----------------------
>
> Key: HDFS-3004
> URL: https://issues.apache.org/jira/browse/HDFS-3004
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: tools
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch,
> HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch,
> HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch,
> HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch,
> HDFS-3004.023.patch, HDFS-3004.024.patch, HDFS-3004.026.patch,
> HDFS-3004.027.patch, HDFS-3004.029.patch, HDFS-3004.030.patch,
> HDFS-3004.031.patch, HDFS-3004.032.patch, HDFS-3004.033.patch,
> HDFS-3004.034.patch, HDFS-3004.035.patch, HDFS-3004.036.patch,
> HDFS-3004.037.patch, HDFS-3004.038.patch,
> HDFS-3004__namenode_recovery_tool.txt
>
>
> When the NameNode metadata is corrupt for some reason, we want to be able to
> fix it. Obviously, we would prefer never to get in this case. In a perfect
> world, we never would. However, bad data on disk can happen from time to
> time, because of hardware errors or misconfigurations. In the past we have
> had to correct it manually, which is time-consuming and which can result in
> downtime.
> Recovery mode is initialized by the system administrator. When the NameNode
> starts up in Recovery Mode, it will try to load the FSImage file, apply all
> the edits from the edits log, and then write out a new image. Then it will
> shut down.
> Unlike in the normal startup process, the recovery mode startup process will
> be interactive. When the NameNode finds something that is inconsistent, it
> will prompt the operator as to what it should do. The operator can also
> choose to take the first option for all prompts by starting up with the '-f'
> flag, or typing 'a' at one of the prompts.
> I have reused as much code as possible from the NameNode in this tool.
> Hopefully, the effort that was spent developing this will also make the
> NameNode editLog and image processing even more robust than it already is.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira