By scouring various web pages and lists via google I've found some general recommendations when it comes to setting the number of map and reduce slots for a cluster. It seems to come down to setting them to roughly the number of cores on the machine, minus some if there will be other processes active (such as HBase region servers), and to set the per-task memory usage so that the total will stay below that of the system. Is this a reasonably general heuristic?

One thing I haven't been able to find advice on is whether this heuristic should be adjusted for machines that have hyperthreading enabled. My thought is that it wouldn't be beneficial to increase the number of slots (especially in a CPU-bound application) as slots equal to the # of cores would already be fully utilizing the CPU. Are there alternative thoughts regarding that?

- Adam

Reply via email to