Couple of things that one can do: 1. dfs.name.dir should have at least two locations, one on the local disk and one on NFS. This means that all transactions are synchronously logged into two places.
2. Create a virtual IP, say name.xx.com that points to the real machine name of the machine on which the namenode runs. If the namenode machine burns, then change the virtual IP to point to a new machine. Copy the namenode metadata from the NFS location to the local disk on this new machine. Then start namenode on this new machine. Done! -dhruba On Mon, Nov 10, 2008 at 12:24 AM, Goel, Ankur <[EMAIL PROTECTED]> wrote: > Hi Folks, > > I am looking for some advice on some the ways / techniques > that people are using to get around namenode failures (Both disk and > host). > > We have a small cluster with several job scheduled for periodic > execution on the same host where name server runs. What we would like to > have is an automatic failover mechanism in hadoop so that a secondary > namenode automatically takes the roll of a master. > > > > I can move this discussion to a JIRA if people are interested. > > > > Thanks > > -Ankur > >