Hi Geoffry,

A good answer to your question will probably involve more discussion on the
nature of the workload and the main bottlenecks that you are seeing with it.


My 2 cents:
If your workload is IO intensive, adding more disks and increasing the
amount of physical memory on these systems would probably be the least
expensive upgrade that you could try with. Enabling JVM reuse, compressing
map output using LZO library, tuning HDFS block size and avoiding map spills
are good starting points in terms of Hadoop level tuning.

HTH,
-Shrinivas

On Thu, Apr 21, 2011 at 12:33 PM, Geoffry Roberts <[email protected]
> wrote:

> All,
>
>  I am a developer, not a super networking guy or hardware guy, and new to
> Hadoop.
>
> I'm working a research project. Funds are limited.  I have a compute
> problem
> where I need to get the performance up on the processing of large text
> files
> and no doubt Hadoop can help if I do things well.
>
> I am cobbling my cluster together, to the greatest extent possible, out of
> spare parts.  I can spend some money, but must do so with deliberation and
> prudence.
>
> I have at my disposal twelve, one time desk top computers:
>
>   - Pentium 4 3.80GHz
>   - 2-4G of memory
>   - 1 Gigabit NIC
>   - 1 Disk, Serial ATA/150 7,200 RPM
>
> I have installed:
>
>   - Ubuntu 10.10 /64 server
>   - JDK /64
>   - Hadoop 0.21.0
>
> Processing is still slow.  I am tuning Hadoop, but I'm guessing I should
> also upgrade my hardware.
>
> What will give me the most bang for my buck?
>
>   - Should I bring all machines up to 8G of memory? or is 4G good enough?
>   (8 is the max.)
>   - Should I double up the NICs and use LACP?
>   - Should I double up the disks and attempt to flow my I/O from one disk
>   to the another on the theory that this will minimizing contention?
>   - Should I get another switch?  (I have a 10/100, 24 port Dlink and it's
>   about 5 years old.)
>
> Thanks in advance
> --
> Geoffry Roberts
>

Reply via email to