On 22/09/11 05:42, praveenesh kumar wrote:
Hi all,
Can we replace our namenode machine later with some other machine. ?
Actually I got a new server machine in my cluster and now I want to make
this machine as my new namenode and jobtracker node ?
Also Does Namenode/JobTracker machine's configuration needs to be better
than datanodes/tasktracker's ??
1. I'd give it lots of RAM - holding data about many files, avoiding
swapping, etc.
2. I'd make sure the disks are RAID5, with some NFS-mounted FS that the
secondary namenode can talk to. avoids risk of loss of the index, which,
if it happens, renders your filesystem worthless. If I was really
paranoid I'd have twin raid controllers with separate connections to
disk arrays in separate racks, as [Jiang2008] shows that interconnect
problems on disk arrays can be higher than HDD failures.
3. if your central switches are at 10 GbE, consider getting a 10GbE NIC
and hooking it up directly -this stops the network being the bottleneck,
though it does mean the server can have a lot more packets hitting it,
so putting more load on it.
4. Leave space for a second CPU and time for GC tuning.
JT's are less important; they need RAM but use HDFS for storage. If your
cluster is small, NN and JT can be run locally. If you do this, set up
DNS to have two hostnames to point to same network address. Then if you
ever split them off, everyone whose bookmark says http://jobtracker
won't notice
Either way: the NN and the JT are the machines whose availability you
care about. The rest is just a source of statistics you can look at later.
-Steve
[Jiang2008] "Are disks the dominant contributor for storage failures?: A
comprehensive study of storage subsystem failure characteristics". ACM
Transactions on Storage.