All, I am a developer, not a super networking guy or hardware guy, and new to Hadoop.
I'm working a research project. Funds are limited. I have a compute problem where I need to get the performance up on the processing of large text files and no doubt Hadoop can help if I do things well. I am cobbling my cluster together, to the greatest extent possible, out of spare parts. I can spend some money, but must do so with deliberation and prudence. I have at my disposal twelve, one time desk top computers: - Pentium 4 3.80GHz - 2-4G of memory - 1 Gigabit NIC - 1 Disk, Serial ATA/150 7,200 RPM I have installed: - Ubuntu 10.10 /64 server - JDK /64 - Hadoop 0.21.0 Processing is still slow. I am tuning Hadoop, but I'm guessing I should also upgrade my hardware. What will give me the most bang for my buck? - Should I bring all machines up to 8G of memory? or is 4G good enough? (8 is the max.) - Should I double up the NICs and use LACP? - Should I double up the disks and attempt to flow my I/O from one disk to the another on the theory that this will minimizing contention? - Should I get another switch? (I have a 10/100, 24 port Dlink and it's about 5 years old.) Thanks in advance -- Geoffry Roberts
