For our CPU-bound application, I set the value of mapred.tasktracker.tasks.maximum (number of map tasks per tasktracker) equal to the number of CPUs on a tasktracker. Unfortunately, I think this value has to be set per cluster, not per machine. This is okay for us because our machines have similar hardware, but it might be a problem if your machines have different numbers of CPUs.
I created HADOOP-1245 a long time ago for this problem, but I've since heard that hadoop uses only the cluster value for maps per tasktracker, not the hybrid model I describe. In any case, I never did any work on fixing it because I don't need heterogeneous clusters. -Michael On 9/25/07 9:37 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: On 9/25/07 9:27 AM, "Bob Futrelle" <[EMAIL PROTECTED]> wrote: > > How does Hadoop handle multi-core CPUs? Does each core run a distinct copy > of the mapped app? Is this automatic, or need some configuration, or what? Works fine. You need to tell it how many maps to run per machine. I expect that this can be tuned per machine. > Or should I just spread Hadoop over some friendly machines already in my > College, buying nothing? Or both? You will get interesting results all three ways.