On Sep 25, 2007, at 10:02 AM, Allen Wittenauer wrote:
On 9/25/07 9:27 AM, "Bob Futrelle" <[EMAIL PROTECTED]> wrote:>
I'm in the market to buy a few machines to set up a small cluster
and am
wondering what I should consider.
If it helps, we're using quad core x86s with anywhere from 4g
to 16g of
ram. We've got 4x500g sata drives per box, no raid, swap and root
taking a
chunk out of each drive and the rest used for HDFS and/or MR work.
How many map tasks are you running on each machine, one per core?
If so, do you have it set up such that each one is talking to its own
dedicated drive?
I've been testing on quad core machines myself, but I didn't see an
obvious way
to set it up such that there was one map task per core talking to its
own drive,
and I also wasn't sure if it would really matter, as the performance
tests I did run seemed
to indicate that it didn't make much of a difference.
While you can certainly go a much more heterogeneous route than
we have,
it should be noted that the more differences in the hardware/software
layout, the more difficult is going to be to maintain them. This is
especially true for large grids where hand-tuning individual
machines just
isn't worth the return on effort.
Or should I just spread Hadoop over some friendly machines already
in my
College, buying nothing?
Given the current lack of a security model in Hadoop and the
direction
that a smattering of Jira's are heading, "friendly" could go either
way:
either not friendly enough or too friendly. :)