Re: Automatic recovery Mechanism for namenode failure...

Dhruba Borthakur Wed, 23 Jul 2008 06:51:15 -0700

Just to add to Lohit's comments: Make a fs.name.dir be a NFS mounted
directory. So, all transactions would be stored in a NFS server. You
can make the NFS server highly-available by other means. If the
namenode fails, you can then bring up a new namenode and point it to
the NFS mounted directory to pick up the transaction log.


There has been some discussions on making a HA solutions for the
namenode. Discussions have centered around keeping a copy of the
transaction log in HDFS itself (a.k.a. superblock), or streaming
transaction log updates to multiple downstream namenodes. Node of
these solutions have been discussed in detail. If you are willing to
explore these ideas and then possibly contribute to getting it
implemented, that will be awesome!

thanks,
dhruba

On Wed, Jul 23, 2008 at 12:52 AM, lohit <[EMAIL PROTECTED]> wrote:
>
>
>
>>After going through the comments in the patch what I could gather is the
>>application of this patch will simply enable us to start the namenode in
>>case of a failure from the  fsimage in the fs.checkpoint.dir, instead of
>>looking at the standard  position of <dfs.name.dir>/current/fsimage.
>>In case the namenode goes down, and we have the checkpointed dir in
>>secondary namenode, the namenode can be started up therein. But this
>>would require manual startup of the namenode in the secondary namenode
>>server with -importcheckpoint option. Please correct me if anything is
>>wrong with my understanding of the application of Hadoop-2585.
>
> Your understand is absolutely correct.
> But you need to keep one thing in mind, import from checkpoint means what it 
> says, you are importing the latest checkpointed image.
> That might not be the latest image your cluster has. For example, default 
> checkpoint time is 60 minutes (I think). which means, if your are running a 
> namenode which stopped after say 50 minutes from the last checkpoint, trying 
> to import old image, would endup with losing all 50 minutes of changes. You 
> should just try to start the namenode as is, with existing image and see if 
> namenode boots up, sometimes it does, only if you are unable to, you should 
> try the checkpointed image.
>
>>However for our situation we intend to have a mechanism that will detect
>>a namenode failure. and automatically startup the namenode with
>>-importcheckpoint option in the secondary namenode server.  When i say
>>automatically it necessarily means absolutely no manual intervention at
>>the point of failure and startup.  The datanodes in the slaves should
>>also be automatically aware of the change in the Namenode and the
>>cluster on the whole should go on without hampering functionality.
>
>>Hence, my question is,
>>- is such a mechanism part of any future hadoop release ?
>
> As of now, there is not automatic failover
>
>>- is hadoop-2585 a step towards incorporating automatic namenode
>>recovery mechanism in Hadoop ?
>
> In some sense yes, and some no.
>
>>- currently using hadoop 0.17.1, is there anyway we can achieve
>>something close to the mechanism we want.
>
> One thing you could do is this. In your config, there is an variable named 
> dfs.name.dir. which accepts a comma separated list of directories. Namenode 
> stores its image and edits in all the directories listed by this config 
> variable. Normal way of operating to have one directory local to the namenode 
> and other on NFS and possible another on same node, but different disk. Now, 
> when namenode goes down, you can try to start it again, even if any one of 
> the directories has good image, namenode pick that and start functioning.
>
> But as we have seen, namenode does not go down that easily, all exceptions 
> are caught and logged keeping the namenode running.
>
> Thanks,
> Lohit
>
>

Re: Automatic recovery Mechanism for namenode failure...

Reply via email to