Re: Namenode cluster and fail over

Konstantin Shvachko Fri, 07 Mar 2008 11:46:15 -0800

We are evaluating a plan to migrate NetApp NAS 400 TB storage system to
Hadoop file system.


One of crucial requirement for us is high availability and reliability of
storage system.

By reading Hadoop architecture and design doc, In case of Namenode failure,
it needs a manually recovery from Secondar NameNode. Is that still the case?


Manual recovery from the Secondary node is the last resort if everything else 
failed.
The Namenode can be configured to save the image and the change logs into 
multiple
storage directories. We usually configure them to be on different hard drives 
on the
same machine or mounted via nfs.
So even if the whole machine fails you have a copy of the image that can be 
used to
start name-node on a new machine.
So you use the Secondary's node copy only if all other copies are unavailable.

Any plan to develop full replication of Namenode to SecondayNameNode and
support real time fail over to SeondaryNameNode in case of Namenode failure
?


Automatic recovery from the secondary node image is one of our primary plans.
Should be done pretty soon.
High availability is also a high priority, but is not going to be done tomorrow.
For now you can use some scripting solutions outside of hadoop. Like, running
a daemon that pings your name-node once in while; shuts down and restarts the
cluster if something goes wrong.
Hope this helps.

--Konstantin

Re: Namenode cluster and fail over

Reply via email to