Hi All,

We have been using hadoop 0.17.1 for a 50 machine cluster.

Since we have continuous weblogs being written into the HDFS therein, we are concerned about the failure of the namenode. Digging into hadoop documentation, i found out that currently hadoop does not support automatic recovery of the namenode.

However i came across a discussion which led me to a patch HADOOP-2585 () - Automatic namespace recovery from secondary image.

After going through the comments in the patch what I could gather is the application of this patch will simply enable us to start the namenode in case of a failure from the fsimage in the fs.checkpoint.dir, instead of looking at the standard position of <dfs.name.dir>/current/fsimage. In case the namenode goes down, and we have the checkpointed dir in secondary namenode, the namenode can be started up therein. But this would require manual startup of the namenode in the secondary namenode server with -importcheckpoint option. Please correct me if anything is wrong with my understanding of the application of Hadoop-2585.

However for our situation we intend to have a mechanism that will detect a namenode failure. and automatically startup the namenode with -importcheckpoint option in the secondary namenode server. When i say automatically it necessarily means absolutely no manual intervention at the point of failure and startup. The datanodes in the slaves should also be automatically aware of the change in the Namenode and the cluster on the whole should go on without hampering functionality.

Hence, my question is,
- is such a mechanism part of any future hadoop release ?
- is hadoop-2585 a step towards incorporating automatic namenode recovery mechanism in Hadoop ? - currently using hadoop 0.17.1, is there anyway we can achieve something close to the mechanism we want.

thanking you

Pratyush Banerjee

Reply via email to