Automatic recovery Mechanism for namenode failure...

Pratyush Banerjee Wed, 23 Jul 2008 01:38:43 -0700

Hi All,

We have been using hadoop 0.17.1 for a 50 machine cluster.

Since we have continuous weblogs being written into the HDFS therein, weare concerned about the failure of the namenode. Digging into hadoopdocumentation, i found out that currently hadoop does not supportautomatic recovery of the namenode.

However i came across a discussion which led me to a patch HADOOP-2585() - Automatic namespace recovery from secondary image.

After going through the comments in the patch what I could gather is theapplication of this patch will simply enable us to start the namenode incase of a failure from the fsimage in the fs.checkpoint.dir, instead oflooking at the standard position of <dfs.name.dir>/current/fsimage.In case the namenode goes down, and we have the checkpointed dir insecondary namenode, the namenode can be started up therein. But thiswould require manual startup of the namenode in the secondary namenodeserver with -importcheckpoint option. Please correct me if anything iswrong with my understanding of the application of Hadoop-2585.

However for our situation we intend to have a mechanism that will detecta namenode failure. and automatically startup the namenode with-importcheckpoint option in the secondary namenode server. When i sayautomatically it necessarily means absolutely no manual intervention atthe point of failure and startup. The datanodes in the slaves shouldalso be automatically aware of the change in the Namenode and thecluster on the whole should go on without hampering functionality.


Hence, my question is,
- is such a mechanism part of any future hadoop release ?

- is hadoop-2585 a step towards incorporating automatic namenoderecovery mechanism in Hadoop ?- currently using hadoop 0.17.1, is there anyway we can achievesomething close to the mechanism we want.


thanking you

Pratyush Banerjee

Automatic recovery Mechanism for namenode failure...

Reply via email to