Re: hardware specs for hadoop nodes

Ross Boucher Tue, 25 Sep 2007 10:19:27 -0700


On Sep 25, 2007, at 10:09 AM, Michael Bieniosek wrote:

For our CPU-bound application, I set the value ofmapred.tasktracker.tasks.maximum (number of map tasks pertasktracker) equal to the number of CPUs on a tasktracker.Unfortunately, I think this value has to be set per cluster, notper machine. This is okay for us because our machines have similarhardware, but it might be a problem if your machines have differentnumbers of CPUs.

I did some experimentation with the number of tasks per machine on aset of quad core boxes. I couldn't figure out how to change thisvalue without stopping the cluster and restarting it, and I alsocouldn't figure out how to tune it on a per machine basis (though itdidn't matter much for me either).

My test had no reduce phase, so I simply set the reduce count to 1per machine for all the tests. On the quad core boxes, 5 map tasksper machine actually performed the best, but only marginally betterthan 4 map tasks (about 4% with just one box in the cluster, 2% with4 boxes). Six tasks started to trend back in the other direction.

I created HADOOP-1245 a long time ago for this problem, but I'vesince heard that hadoop uses only the cluster value for maps pertasktracker, not the hybrid model I describe. In any case, I neverdid any work on fixing it because I don't need heterogeneous clusters.
-Michael

On 9/25/07 9:37 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote:

On 9/25/07 9:27 AM, "Bob Futrelle" <[EMAIL PROTECTED]> wrote:
How does Hadoop handle multi-core CPUs? Does each core run adistinct copyof the mapped app? Is this automatic, or need some configuration,or what?
Works fine. You need to tell it how many maps to run per machine.I expect
that this can be tuned per machine.
Or should I just spread Hadoop over some friendly machines alreadyin my
College, buying nothing?
Or both?  You will get interesting results all three ways.

Re: hardware specs for hadoop nodes

Reply via email to