Just found this document that seems to answer all my initial questions. http://wiki.apache.org/hadoop/MachineScaling
Thanks anyway, -- Sérgio Nunes On Wed, Mar 5, 2008 at 3:16 PM, S. Nunes <[EMAIL PROTECTED]> wrote: > Hi, > > I'm trying to deploy a small Hadoop cluster for our research lab. > We are in the process of selecting the hardware for this cluster. We > are aiming at a 12 CPU, 5 TB cluster. This is obviously a very rough > estimation. > > I have a few questions and I would greatly appreciate your feedback. > > Which is better, a cluster based on many low performance nodes; or a > cluster with fewer but high performance nodes? For instance, should I > bet on a cluster with 4 nodes (1 CPU + 100 GB each) or on a cluster > with 2 nodes (2 CPU + 200 GB each)? > > What should be considered regarding node homogeneity? I understand > that a very unbalanced cluster would result in a "long tailed" > performance - slower nodes would penalize the overall performance. > However, how critical is that? Do you have performance numbers to > support our decision? > > Finally, do you recommend any specific hardware configuration for > starting a cluster (rack, blade, tower...) ? > > Thanks in advance for your comments, > > -- > Sérgio Nunes >
