For me, I have three configurations available. A) database class machine with many (>10) fast SAS drives and >10GB memory, dual or quad x quad core cpu's. Let's say that this costs about 20K$.
B) generic productiion machine with 2 x 250GB SATA drives, 4-12GB RAM, dual x dual core CPU's (=Dell 1950). Cost is about 2K$. C) POS beige box machine with 2 x SATA drives of variable size, 4 GB RAM, single dual core CPU. Cost is about 1K$. For a $50K budget, I would take 25x(b) over 50x(c) due to simpler and smaller admin issues even though cost/performance would be nominally about the same. I would avoid 2x(a) like the plague. On 11/7/07 11:56 AM, "Chris Fellows" <[EMAIL PROTECTED]> wrote: > Hello, > > Much of the hadoop documentation speaks to large clusters of commodity > machines. There is a debate on our end about which would be better: a small > number of high performance machines (2 boxes with 4 quad core processors) or X > number of commodity machines. I feel that disk I/O might be the bottle neck > with the 2 high perf machines (though I did just read in the FAQ about being > able to split the dfs-data across multiple drives). > > So this is a "which would rather" question. If you were setting up a cluster > of machines to perform data rollups/aggregation (and other mapred tasks) on > files in the .25-1TB size, which would rather have: > > 1. 2 4 quad core machines with your choice on RAM and number of drives > 2. 10 (or more) commodity machines (as defined on the hadoop wiki) > > And of course a "why?" would be very helpful. > > Thanks! >
