Hyperthreading is interesting, but I'd put more emphasis on the amount
of RAM you have on your boxes.
The JavaVM allocates all it's heap-size upfront, which means your node
will starting thrashing on RAM if you put too many tasks per node.
Arun
On Jan 6, 2011, at 5:51 PM, Adam Phelps wrote:
By scouring various web pages and lists via google I've found some
general recommendations when it comes to setting the number of map and
reduce slots for a cluster. It seems to come down to setting them to
roughly the number of cores on the machine, minus some if there will
be
other processes active (such as HBase region servers), and to set the
per-task memory usage so that the total will stay below that of the
system. Is this a reasonably general heuristic?
One thing I haven't been able to find advice on is whether this
heuristic should be adjusted for machines that have hyperthreading
enabled. My thought is that it wouldn't be beneficial to increase the
number of slots (especially in a CPU-bound application) as slots equal
to the # of cores would already be fully utilizing the CPU. Are there
alternative thoughts regarding that?
- Adam