Is hadoop designed to run on homogeneous hardware only, or does it work just as well on heterogeneous hardware as well? If the datanodes have different disk capacities, does HDFS still spread the data blocks equally amount all the datanodes, or will the datanodes with high disk capacity end up storing more data blocks? Similarily, if the tasktrackres have different numbers of CPUs, is there a way to configure hadoop to run more tasks on those tasktrackers that have more CPUs? Is that simply a matter of setting mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum differently on the tasktrackers?
Bill
