Just to add to Lohit's comments: Make a fs.name.dir be a NFS mounted directory. So, all transactions would be stored in a NFS server. You can make the NFS server highly-available by other means. If the namenode fails, you can then bring up a new namenode and point it to the NFS mounted directory to pick up the transaction log.
There has been some discussions on making a HA solutions for the namenode. Discussions have centered around keeping a copy of the transaction log in HDFS itself (a.k.a. superblock), or streaming transaction log updates to multiple downstream namenodes. Node of these solutions have been discussed in detail. If you are willing to explore these ideas and then possibly contribute to getting it implemented, that will be awesome! thanks, dhruba On Wed, Jul 23, 2008 at 12:52 AM, lohit <[EMAIL PROTECTED]> wrote: > > > >>After going through the comments in the patch what I could gather is the >>application of this patch will simply enable us to start the namenode in >>case of a failure from the fsimage in the fs.checkpoint.dir, instead of >>looking at the standard position of <dfs.name.dir>/current/fsimage. >>In case the namenode goes down, and we have the checkpointed dir in >>secondary namenode, the namenode can be started up therein. But this >>would require manual startup of the namenode in the secondary namenode >>server with -importcheckpoint option. Please correct me if anything is >>wrong with my understanding of the application of Hadoop-2585. > > Your understand is absolutely correct. > But you need to keep one thing in mind, import from checkpoint means what it > says, you are importing the latest checkpointed image. > That might not be the latest image your cluster has. For example, default > checkpoint time is 60 minutes (I think). which means, if your are running a > namenode which stopped after say 50 minutes from the last checkpoint, trying > to import old image, would endup with losing all 50 minutes of changes. You > should just try to start the namenode as is, with existing image and see if > namenode boots up, sometimes it does, only if you are unable to, you should > try the checkpointed image. > >>However for our situation we intend to have a mechanism that will detect >>a namenode failure. and automatically startup the namenode with >>-importcheckpoint option in the secondary namenode server. When i say >>automatically it necessarily means absolutely no manual intervention at >>the point of failure and startup. The datanodes in the slaves should >>also be automatically aware of the change in the Namenode and the >>cluster on the whole should go on without hampering functionality. > >>Hence, my question is, >>- is such a mechanism part of any future hadoop release ? > > As of now, there is not automatic failover > >>- is hadoop-2585 a step towards incorporating automatic namenode >>recovery mechanism in Hadoop ? > > In some sense yes, and some no. > >>- currently using hadoop 0.17.1, is there anyway we can achieve >>something close to the mechanism we want. > > One thing you could do is this. In your config, there is an variable named > dfs.name.dir. which accepts a comma separated list of directories. Namenode > stores its image and edits in all the directories listed by this config > variable. Normal way of operating to have one directory local to the namenode > and other on NFS and possible another on same node, but different disk. Now, > when namenode goes down, you can try to start it again, even if any one of > the directories has good image, namenode pick that and start functioning. > > But as we have seen, namenode does not go down that easily, all exceptions > are caught and logged keeping the namenode running. > > Thanks, > Lohit > >
