Hello, Much of the hadoop documentation speaks to large clusters of commodity machines. There is a debate on our end about which would be better: a small number of high performance machines (2 boxes with 4 quad core processors) or X number of commodity machines. I feel that disk I/O might be the bottle neck with the 2 high perf machines (though I did just read in the FAQ about being able to split the dfs-data across multiple drives).
So this is a "which would rather" question. If you were setting up a cluster of machines to perform data rollups/aggregation (and other mapred tasks) on files in the .25-1TB size, which would rather have: 1. 2 4 quad core machines with your choice on RAM and number of drives 2. 10 (or more) commodity machines (as defined on the hadoop wiki) And of course a "why?" would be very helpful. Thanks!
