Hi,
Our group is trying to set up a prototype for what will eventually
become a cluster of ~50 nodes.
Anyone have experiences with a stateless Hadoop cluster setup using this
method on CentOS? Are there any caveats with a read-only root file
system approach? This would save us from having to keep a root volume
on every system (whether it is installed on a USB thumb drive, or a RAID
1 of bootable / partitions).
http://citethisbook.net/Red_Hat_Introduction_to_Stateless_Linux.html
We would like to keep the OS root file system separate from the Hadoop
filesystem(s) for maintenance reasons (we can hot swap disks while the
system is running)
We were also considering installing the root filesystem on USB flash
drives, making it persistent yet separate. However we would identify
and turn off anything that would cause excess writes to the root
filesystem given the limited number of USB flash drive write cycles
(keep IO writes to the root filesystem to a minimum). We would do this
by storing the Hadoop logs on the same filesystem/drive as what we
specify in dfs.data.dir/dfs.name.dir.
In the end we would have something like this:
USB (MS DOS partition table + 1 ext2/3/4 partition)
/dev/sda
/dev/sda1 mounted as / (possibly read-only)
/dev/sda2 mounted as /var (read-write)
/dev/sda3 mounted as /tmp (read-write)
Hadoop Disks (no partition table or GPT since these are 3TB disks)
/dev/sdb /mnt/d0
/dev/sdc /mnt/d1
/dev/sdd /mnt/d2
/mnt/d0 would contain all Hadoop logs.
Hadoop configuration files would still reside on /
Any issues with such a setup? Are there better ways of achieving this?