Himanshu Sharma wrote:
The NFS seems to be having problem as NFS locking causes namenode hangup.
Can't be there any other way, say if namenode starts writing synchronously
to secondary namenode apart from local directories, then in case of namenode
failover, we can start the primary namenode process on secondary namenode
and the latest checkpointed fsimage is already there on secondary namenode.
NFS shouldn't be used in production datacentres, at least not as the
main way that the nodes talk to a common filesystem.
That doesn't mean it doesn't get used that way, but when the network
plays up, all 1000+ servers suddenly halt on file IO with their logs
filling up with NFS warnings. The problem here is that the OS assumes
that file IO is local and fast, and NFS is trying "transparently" to
recover by blocking for a while, so bringing your apps to a halt. It is
way better to have the failures visible at the app level and make it
apply whatever policy you want -which is exactly what the DFS clients do
when talking to name- or -data nodes.
say no to NFS.
Alternatives
* Some HA databases have two servers sharing access to the same disk
array at the physical layer, so when the 1ary node goes down, the
secondary can take over. but that assumes that it is never the raid-5
disk array that is going to fail. If something very bad happens to the
RAID controller, that assumption may prove to be false.
* SAN storage arrays to route RAID-backed storage to specific nodes in
the cluster. Again, you are hoping that nothing goes wrong behind the
scenes.
* Product placement warning: HP extreme storage with CPUs in the rack
http://h71028.www7.hp.com/enterprise/cache/592778-0-0-0-121.html
I haven't tried bringing up hadoop on one of these -but it would be
interesting to see how well it works. Maybe Apache could start having an
"approved by hadoop" sticker with a yellow elephant on it to attach to
hardware that is known to work.
This also raises a fundamental question, whether we can run secondary
namenode process on the same node as primary namenode process without any
out of memory / heap exceptions ? Also ideally what should be the memory
size of primary namenode if alone and when with secondary namenode process ?
What failures are you planning to deal with? Running the secondary node
process on the same machine means that you could cope with a process
failure, but not machine failure or network outage. You'd also need the
2ary process listening on a second port, so clients would still need to
do some kind of handover.