Re: running hadoop on heterogeneous hardware

Steve Loughran Thu, 22 Jan 2009 02:18:55 -0800

Bill Au wrote:

Is hadoop designed to run on homogeneous hardware only, or does it work just
as well on heterogeneous hardware as well?  If the datanodes have different
disk capacities, does HDFS still spread the data blocks equally amount all
the datanodes, or will the datanodes with high disk capacity end up storing
more data blocks?  Similarily, if the tasktrackres have different numbers of
CPUs, is there a way to configure hadoop to run more tasks on those
tasktrackers that have more CPUs?  Is that simply a matter of setting
mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum differently on the tasktrackers?


Bill

Life is simpler on homogenous boxes; by setting the maximum tasksdifferently for the different machines, you do limit the amount of workthat gets pushed out to those boxes. More troublesome is slowerCPUs/HDDs, they arent picked up directly, though speculative work canhandle some of this

One interesting bit of research would be something adaptive; somethingto monitor throughput and tune those values based on performance; thatwould detect variations in a cluster and work with with it, rather thanrequiring you to know the capabilities of every machine.


-steve

Re: running hadoop on heterogeneous hardware

Reply via email to