Hey Marcus,

Are you recording the data rates coming out of HDFS? Since you have such a low CPU utilizations, I'd look at boxes utterly packed with big hard drives (also, why are you using RAID1 for Hadoop??).

You can get 1U boxes with 4 drive bays or 2U boxes with 12 drive bays. Based on the data rates you see, make the call.

On the other hand, what's the argument against running 3x more mappers per box? It seems that your boxes still have more overhead to use -- there's no I/O wait.

Brian

On Jun 26, 2009, at 4:43 PM, Marcus Herou wrote:

Hi.

We have a deployment of 10 hadoop servers and I now need more mapping
capability (no not just add more mappers per instance) since I have so many
jobs running. Now I am wondering what I should aim on...
Memory, cpu or disk... How long is a rope perhaps you would say ?

A typical server is currently using about 15-20% cpu today on a quad- core
2.4Ghz 8GB RAM machine with 2 RAID1 SATA 500GB disks.

Some specs below.
mpstat 2 5
Linux 2.6.24-19-server (mapreduce2)     06/26/2009

11:36:13 PM CPU %user %nice %sys %iowait %irq %soft %steal
%idle    intr/s
11:36:15 PM all 22.82 0.00 3.24 1.37 0.62 2.49 0.00
69.45   8572.50
11:36:17 PM all 13.56 0.00 1.74 1.99 0.62 2.61 0.00
79.48   8075.50
11:36:19 PM all 14.32 0.00 2.24 1.12 1.12 2.24 0.00
78.95   9219.00
11:36:21 PM all 14.71 0.00 0.87 1.62 0.25 1.75 0.00
80.80   8489.50
11:36:23 PM all 12.69 0.00 0.87 1.24 0.50 0.75 0.00
83.96   5495.00
Average: all 15.62 0.00 1.79 1.47 0.62 1.97 0.00
78.53   7970.30

What I am thinking is... Is it wiser to go for many of these cheap boxes with 8GB of RAM or should I for instance focus on machines which can give
more I|O throughput ?

I know that these things are hard but perhaps someone have draw some
conclusions before the pragmatic way.

Kindly

//Marcus


--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/

Reply via email to