[jira] [Commented] (HDFS-3886) Shutdown requests can possibly check for checkpoint issues (corrupted edits) and save a good namespace copy before closing down?

Harsh J (JIRA) Sat, 08 Sep 2012 11:40:09 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451385#comment-13451385
 ]


Harsh J commented on HDFS-3886:
-------------------------------

Yeah, sorry if my 'Shutdown requests' in title was too ambiguous. I am mostly 
talking of the NN shutdown itself. I'd thought the simple init.d framework 
itself had a kill -9'er (like a supervisor would) but I was wrong.

bq. This would indeed make an RPC to the NN to enter safemode, perform a save 
namespace, and then shut itself down.

So this is currently a simple:
{code}
sudo -u hdfs hdfs dfsadmin -safemode enter
sudo -u hdfs hdfs dfsadmin -saveNamespace
{code}

However, my only caveat with this is that client logs for clients still active, 
will start showing SafeModeExceptions rather than Connection Refused/etc.. Is 
that fine to have, when doing a clean-stop? Would it be better if we shut off 
the client RPC and then issued a namespace save?

I've also not thought of this in HA terms yet.
                
> Shutdown requests can possibly check for checkpoint issues (corrupted edits) 
> and save a good namespace copy before closing down?
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3886
>                 URL: https://issues.apache.org/jira/browse/HDFS-3886
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: Harsh J
>            Priority: Minor
>
> HDFS-3878 sorta gives me this idea. Aside of having a method to download it 
> to a different location, we can also lock up the namesystem (or deactivate 
> the client rpc server) and save the namesystem before we complete up the 
> shutdown.
> The init.d/shutdown scripts would have to work with this somehow though, to 
> not kill -9 it when in-process. Also, the new image may be stored in a 
> shutdown.chkpt directory, to not interfere in the regular dirs, but still 
> allow easier recovery.
> Obviously this will still not work if all directories are broken. So maybe we 
> could have some configs to tackle that as well?
> I haven't thought this through, so let me know what part is wrong to do :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3886) Shutdown requests can possibly check for checkpoint issues (corrupted edits) and save a good namespace copy before closing down?

Reply via email to