Hi.

We have a deployment of 10 hadoop servers and I now need more mapping
capability (no not just add more mappers per instance) since I have so many
jobs running. Now I am wondering what I should aim on...
Memory, cpu or disk... How long is a rope perhaps you would say ?

A typical server is currently using about 15-20% cpu today on a quad-core
2.4Ghz 8GB RAM machine with 2 RAID1 SATA 500GB disks.

Some specs below.
> mpstat 2 5
Linux 2.6.24-19-server (mapreduce2)     06/26/2009

11:36:13 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
%idle    intr/s
11:36:15 PM  all   22.82    0.00    3.24    1.37    0.62    2.49    0.00
69.45   8572.50
11:36:17 PM  all   13.56    0.00    1.74    1.99    0.62    2.61    0.00
79.48   8075.50
11:36:19 PM  all   14.32    0.00    2.24    1.12    1.12    2.24    0.00
78.95   9219.00
11:36:21 PM  all   14.71    0.00    0.87    1.62    0.25    1.75    0.00
80.80   8489.50
11:36:23 PM  all   12.69    0.00    0.87    1.24    0.50    0.75    0.00
83.96   5495.00
Average:     all   15.62    0.00    1.79    1.47    0.62    1.97    0.00
78.53   7970.30

What I am thinking is... Is it wiser to go for many of these cheap boxes
with 8GB of RAM or should I for instance focus on machines which can give
more I|O throughput ?

I know that these things are hard but perhaps someone have draw some
conclusions before the pragmatic way.

Kindly

//Marcus


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/

Reply via email to