On Sep 25, 2007, at 10:09 AM, Michael Bieniosek wrote:

For our CPU-bound application, I set the value of mapred.tasktracker.tasks.maximum (number of map tasks per tasktracker) equal to the number of CPUs on a tasktracker. Unfortunately, I think this value has to be set per cluster, not per machine. This is okay for us because our machines have similar hardware, but it might be a problem if your machines have different numbers of CPUs.

I did some experimentation with the number of tasks per machine on a set of quad core boxes. I couldn't figure out how to change this value without stopping the cluster and restarting it, and I also couldn't figure out how to tune it on a per machine basis (though it didn't matter much for me either).

My test had no reduce phase, so I simply set the reduce count to 1 per machine for all the tests. On the quad core boxes, 5 map tasks per machine actually performed the best, but only marginally better than 4 map tasks (about 4% with just one box in the cluster, 2% with 4 boxes). Six tasks started to trend back in the other direction.

I created HADOOP-1245 a long time ago for this problem, but I've since heard that hadoop uses only the cluster value for maps per tasktracker, not the hybrid model I describe. In any case, I never did any work on fixing it because I don't need heterogeneous clusters.

-Michael

On 9/25/07 9:37 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote:

On 9/25/07 9:27 AM, "Bob Futrelle" <[EMAIL PROTECTED]> wrote:


How does Hadoop handle multi-core CPUs? Does each core run a distinct copy of the mapped app? Is this automatic, or need some configuration, or what?

Works fine. You need to tell it how many maps to run per machine. I expect
that this can be tuned per machine.

Or should I just spread Hadoop over some friendly machines already in my
College, buying nothing?

Or both?  You will get interesting results all three ways.




Reply via email to