Hello,

Much of the hadoop documentation speaks to large clusters of commodity 
machines. There is a debate on our end about which would be better: a small 
number of high performance machines (2 boxes with 4 quad core processors) or X 
number of commodity machines. I feel that disk I/O might be the bottle neck 
with the 2 high perf machines (though I did just read in the FAQ about being 
able to split the dfs-data across multiple drives).

So this is a "which would rather" question. If you were setting up a cluster of 
machines to perform data rollups/aggregation (and other mapred tasks) on files 
in the .25-1TB size, which would rather have:

1. 2 4 quad core machines with your choice on RAM and number of drives
2. 10 (or more) commodity machines (as defined on the hadoop wiki)

And of course a "why?" would be very helpful.

Thanks!

Reply via email to