Hi All,
We have been using hadoop 0.17.1 for a 50 machine cluster.
Since we have continuous weblogs being written into the HDFS therein, we
are concerned about the failure of the namenode. Digging into hadoop
documentation, i found out that currently hadoop does not support
automatic recovery of the namenode.
However i came across a discussion which led me to a patch HADOOP-2585
() - Automatic namespace recovery from secondary image.
After going through the comments in the patch what I could gather is the
application of this patch will simply enable us to start the namenode in
case of a failure from the fsimage in the fs.checkpoint.dir, instead of
looking at the standard position of <dfs.name.dir>/current/fsimage.
In case the namenode goes down, and we have the checkpointed dir in
secondary namenode, the namenode can be started up therein. But this
would require manual startup of the namenode in the secondary namenode
server with -importcheckpoint option. Please correct me if anything is
wrong with my understanding of the application of Hadoop-2585.
However for our situation we intend to have a mechanism that will detect
a namenode failure. and automatically startup the namenode with
-importcheckpoint option in the secondary namenode server. When i say
automatically it necessarily means absolutely no manual intervention at
the point of failure and startup. The datanodes in the slaves should
also be automatically aware of the change in the Namenode and the
cluster on the whole should go on without hampering functionality.
Hence, my question is,
- is such a mechanism part of any future hadoop release ?
- is hadoop-2585 a step towards incorporating automatic namenode
recovery mechanism in Hadoop ?
- currently using hadoop 0.17.1, is there anyway we can achieve
something close to the mechanism we want.
thanking you
Pratyush Banerjee