Re: NameNode failover procedure

Steve Loughran Thu, 31 Jul 2008 03:28:26 -0700

Himanshu Sharma wrote:

The NFS seems to be having problem as NFS locking causes namenode hangup.
Can't be there any other way, say if namenode starts writing synchronously
to secondary namenode apart from local directories, then in case of namenode
failover, we can start the primary namenode process on secondary namenode
and the latest checkpointed fsimage is already there on secondary namenode.

NFS shouldn't be used in production datacentres, at least not as themain way that the nodes talk to a common filesystem.

That doesn't mean it doesn't get used that way, but when the networkplays up, all 1000+ servers suddenly halt on file IO with their logsfilling up with NFS warnings. The problem here is that the OS assumesthat file IO is local and fast, and NFS is trying "transparently" torecover by blocking for a while, so bringing your apps to a halt. It isway better to have the failures visible at the app level and make itapply whatever policy you want -which is exactly what the DFS clients dowhen talking to name- or -data nodes.



say no to NFS.

Alternatives

* Some HA databases have two servers sharing access to the same diskarray at the physical layer, so when the 1ary node goes down, thesecondary can take over. but that assumes that it is never the raid-5disk array that is going to fail. If something very bad happens to theRAID controller, that assumption may prove to be false.

* SAN storage arrays to route RAID-backed storage to specific nodes inthe cluster. Again, you are hoping that nothing goes wrong behind thescenes.


* Product placement warning: HP extreme storage with CPUs in the rack
http://h71028.www7.hp.com/enterprise/cache/592778-0-0-0-121.html

I haven't tried bringing up hadoop on one of these -but it would beinteresting to see how well it works. Maybe Apache could start having an"approved by hadoop" sticker with a yellow elephant on it to attach tohardware that is known to work.

This also raises a fundamental question, whether we can run secondary
namenode process on the same node as primary namenode process without any
out of memory / heap exceptions ? Also ideally what should be the memory
size of primary namenode if alone and when with secondary namenode process ?

What failures are you planning to deal with? Running the secondary nodeprocess on the same machine means that you could cope with a processfailure, but not machine failure or network outage. You'd also need the2ary process listening on a second port, so clients would still need todo some kind of handover.

Re: NameNode failover procedure

Reply via email to