On Aug 10, 2011, at 2:22 AM, Oded Rosen wrote:
> Hi,
> What is the best practice regarding disk allocation on hadoop data nodes?
> We plan on having multiple storage disks per node, and we want to know if we
> should save a smaller, separate disk for the os (centos).
> Is it the suggested configuration, or is it ok to let the OS reside on one of
> the HDFS storage disks?
It's a waste to put the OS disk on a separate disk. Every spindle =
performance, esp for MR spills.
I'm currently configuring:
disk 1 - os, swap, app area, MR spill space, HDFS space
disk 2 through n - swap, MR spill space, HDFS space
The usual reason people say to put the OS on a separate space is to
make upgrades easier as you won't have to touch the application. The reality
is that you're going to blow away the entire machine during an upgrade anyway.
So don't worry about this situation.
I know a lot of people combine the MR spill space and HDFS space onto
the same partition, but I've found that keeping them separate has two
advantages:
* No longer have to deal with the stupid math that HDFS uses for
reservation--no question as to how much space one actually has
* A hard limit on MR space kills badly written jobs before they eat up
enough space to nuke HDFS
Of course, the big disadvantage is one needs to calculate the correct
space needed, and that's a toughie. But if you know your applications then not
a problem. Besides, if one gets it wrong, you can always do a rolling
re-install to fix it.
Also note that in this configuration that one cannot take advantage of
the "keep the machine up at all costs" features in newer Hadoop's, which
require that root, swap, and the log area be mirrored to be truly effective.
I'm not quite convinced that those features are worth it yet for anything
smaller than maybe a 12 disk config.