On Apr 10, 2010, at 4:02 PM, Dmitry Pushkarev wrote:
I have a cluster with where each node can run up to 8 map tasks (one
task
per core), now we realized that we need to run another type of job
that has
much larger memory requirements, which will only allow up to 4 tasks
to be
run on each node. Is it possible to somehow specify that each map
process of
that new task "occupies" two map slots so that at most 4 such maps
will be
launched?
Which MR scheduler are you running?
The CapacityScheduler (http://hadoop.apache.org/common/docs/r0.20.0/capacity_scheduler.html
) has exactly the feature you are looking for, it's called 'High RAM
jobs'. I'm not sure whether the FairScheduler has this feature, I'll
let someone more knowledgeable comment on the FS.
Unfortunately, this feature in CS is available only in trunk/
hadoop-0.21 which hasn't released yet.
We, at Yahoo!, run a version hadoop-0.20 which includes a backport for
this feature in the CS:
http://github.com/yahoo/hadoop-common/commits/yahoo-hadoop-0.20.9-stable
Arun