What should happen ideally is when the namenode fails, secondary namenode should automatically pickup from the last checkpoint without hindering normal operations. The data-nodes should also detect this change and upgrade their configurations accordingly. Not sure about the behavior of JobTracker and TaskTracker in the middle of a running job.
I think we should have a separate jira for this.

-Ankur

Dhruba Borthakur wrote:
Just to add to Lohit's comments: Make a fs.name.dir be a NFS mounted
directory. So, all transactions would be stored in a NFS server. You
can make the NFS server highly-available by other means. If the
namenode fails, you can then bring up a new namenode and point it to
the NFS mounted directory to pick up the transaction log.

There has been some discussions on making a HA solutions for the
namenode. Discussions have centered around keeping a copy of the
transaction log in HDFS itself (a.k.a. superblock), or streaming
transaction log updates to multiple downstream namenodes. Node of
these solutions have been discussed in detail. If you are willing to
explore these ideas and then possibly contribute to getting it
implemented, that will be awesome!

thanks,
dhruba

On Wed, Jul 23, 2008 at 12:52 AM, lohit <[EMAIL PROTECTED]> wrote:

After going through the comments in the patch what I could gather is the
application of this patch will simply enable us to start the namenode in
case of a failure from the  fsimage in the fs.checkpoint.dir, instead of
looking at the standard  position of <dfs.name.dir>/current/fsimage.
In case the namenode goes down, and we have the checkpointed dir in
secondary namenode, the namenode can be started up therein. But this
would require manual startup of the namenode in the secondary namenode
server with -importcheckpoint option. Please correct me if anything is
wrong with my understanding of the application of Hadoop-2585.
Your understand is absolutely correct.
But you need to keep one thing in mind, import from checkpoint means what it 
says, you are importing the latest checkpointed image.
That might not be the latest image your cluster has. For example, default 
checkpoint time is 60 minutes (I think). which means, if your are running a 
namenode which stopped after say 50 minutes from the last checkpoint, trying to 
import old image, would endup with losing all 50 minutes of changes. You should 
just try to start the namenode as is, with existing image and see if namenode 
boots up, sometimes it does, only if you are unable to, you should try the 
checkpointed image.

However for our situation we intend to have a mechanism that will detect
a namenode failure. and automatically startup the namenode with
-importcheckpoint option in the secondary namenode server.  When i say
automatically it necessarily means absolutely no manual intervention at
the point of failure and startup.  The datanodes in the slaves should
also be automatically aware of the change in the Namenode and the
cluster on the whole should go on without hampering functionality.
Hence, my question is,
- is such a mechanism part of any future hadoop release ?
As of now, there is not automatic failover

- is hadoop-2585 a step towards incorporating automatic namenode
recovery mechanism in Hadoop ?
In some sense yes, and some no.

- currently using hadoop 0.17.1, is there anyway we can achieve
something close to the mechanism we want.
One thing you could do is this. In your config, there is an variable named 
dfs.name.dir. which accepts a comma separated list of directories. Namenode 
stores its image and edits in all the directories listed by this config 
variable. Normal way of operating to have one directory local to the namenode 
and other on NFS and possible another on same node, but different disk. Now, 
when namenode goes down, you can try to start it again, even if any one of the 
directories has good image, namenode pick that and start functioning.

But as we have seen, namenode does not go down that easily, all exceptions are 
caught and logged keeping the namenode running.

Thanks,
Lohit



Reply via email to