For our CPU-bound application, I set the value of 
mapred.tasktracker.tasks.maximum (number of map tasks per tasktracker) equal to 
the number of CPUs on a tasktracker.  Unfortunately, I think this value has to 
be set per cluster, not per machine.  This is okay for us because our machines 
have similar hardware, but it might be a problem if your machines have 
different numbers of CPUs.

I created HADOOP-1245 a long time ago for this problem, but I've since heard 
that hadoop uses only the cluster value for maps per tasktracker, not the 
hybrid model I describe.  In any case, I never did any work on fixing it 
because I don't need heterogeneous clusters.

-Michael

On 9/25/07 9:37 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote:

On 9/25/07 9:27 AM, "Bob Futrelle" <[EMAIL PROTECTED]> wrote:

>
> How does Hadoop handle multi-core CPUs?  Does each core run a distinct copy
> of the mapped app?  Is this automatic, or need some configuration, or what?

Works fine.  You need to tell it how many maps to run per machine.  I expect
that this can be tuned per machine.

> Or should I just spread Hadoop over some friendly machines already in my
> College, buying nothing?

Or both?  You will get interesting results all three ways.



Reply via email to