[jira] [Commented] (HDFS-3004) Implement Recovery Mode

Colin Patrick McCabe (Commented) (JIRA) Thu, 05 Apr 2012 15:20:51 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247767#comment-13247767
 ]


Colin Patrick McCabe commented on HDFS-3004:
--------------------------------------------

Todd:
> this isn't compiling anymore...

Sigh.  Will rebase on trunk... again.

Nicholas:
> Since the recover mode may cause data lost, we should prompt and warn the 
> user in the very beginning.

We prompt the user before doing anything destructive, unless the 
-autoChooseDefault option is enabled.

> What happen if "-autoChooseDefault" is run with other options or standalone?

There are no other options for recovery mode except autoChooseDefault.  Check 
the usage or run -h for more information.

> Why remove JournalStream?

It's deadcode which does nothing.

> Please change RequestStop to RequestStopException. It is better to add an 
> error
> message to it. Also, please add javadoc to describe what does it mean.

Ok.

> Why askOperator(..) belongs to FSEditLogLoader but not RecoveryContext?

Because it relates to FSEditLogLoader, not to RecoveryContext.

> Could you rename RecoveryContext to something related to image/edit, say 
> ImageEditRecovery.

I suppose MetaRecoveryContext could work.  This would avoid confusion with 
"name node lease recovery" or "datanode recovery."  Similarly we could add 
"meta" before some other recovery-related things.
                
> Implement Recovery Mode
> -----------------------
>
>                 Key: HDFS-3004
>                 URL: https://issues.apache.org/jira/browse/HDFS-3004
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: tools
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, 
> HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, 
> HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, 
> HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, 
> HDFS-3004.023.patch, HDFS-3004.024.patch, HDFS-3004.026.patch, 
> HDFS-3004.027.patch, HDFS-3004.029.patch, HDFS-3004.030.patch, 
> HDFS-3004.031.patch, HDFS-3004.032.patch, HDFS-3004.033.patch, 
> HDFS-3004.034.patch, HDFS-3004.035.patch, HDFS-3004.036.patch, 
> HDFS-3004.037.patch, HDFS-3004.038.patch, 
> HDFS-3004__namenode_recovery_tool.txt
>
>
> When the NameNode metadata is corrupt for some reason, we want to be able to 
> fix it.  Obviously, we would prefer never to get in this case.  In a perfect 
> world, we never would.  However, bad data on disk can happen from time to 
> time, because of hardware errors or misconfigurations.  In the past we have 
> had to correct it manually, which is time-consuming and which can result in 
> downtime.
> Recovery mode is initialized by the system administrator.  When the NameNode 
> starts up in Recovery Mode, it will try to load the FSImage file, apply all 
> the edits from the edits log, and then write out a new image.  Then it will 
> shut down.
> Unlike in the normal startup process, the recovery mode startup process will 
> be interactive.  When the NameNode finds something that is inconsistent, it 
> will prompt the operator as to what it should do.   The operator can also 
> choose to take the first option for all prompts by starting up with the '-f' 
> flag, or typing 'a' at one of the prompts.
> I have reused as much code as possible from the NameNode in this tool.  
> Hopefully, the effort that was spent developing this will also make the 
> NameNode editLog and image processing even more robust than it already is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3004) Implement Recovery Mode

Reply via email to