Hi. We have a deployment of 10 hadoop servers and I now need more mapping capability (no not just add more mappers per instance) since I have so many jobs running. Now I am wondering what I should aim on... Memory, cpu or disk... How long is a rope perhaps you would say ?
A typical server is currently using about 15-20% cpu today on a quad-core 2.4Ghz 8GB RAM machine with 2 RAID1 SATA 500GB disks. Some specs below. > mpstat 2 5 Linux 2.6.24-19-server (mapreduce2) 06/26/2009 11:36:13 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 11:36:15 PM all 22.82 0.00 3.24 1.37 0.62 2.49 0.00 69.45 8572.50 11:36:17 PM all 13.56 0.00 1.74 1.99 0.62 2.61 0.00 79.48 8075.50 11:36:19 PM all 14.32 0.00 2.24 1.12 1.12 2.24 0.00 78.95 9219.00 11:36:21 PM all 14.71 0.00 0.87 1.62 0.25 1.75 0.00 80.80 8489.50 11:36:23 PM all 12.69 0.00 0.87 1.24 0.50 0.75 0.00 83.96 5495.00 Average: all 15.62 0.00 1.79 1.47 0.62 1.97 0.00 78.53 7970.30 What I am thinking is... Is it wiser to go for many of these cheap boxes with 8GB of RAM or should I for instance focus on machines which can give more I|O throughput ? I know that these things are hard but perhaps someone have draw some conclusions before the pragmatic way. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/