See https://issues.apache.org/jira/browse/HADOOP-3420 This topic was discussed two years ago.
On Fri, Jun 11, 2010 at 8:45 AM, Edward Capriolo <[email protected]>wrote: > On Fri, Jun 11, 2010 at 8:35 AM, Sébastien Rainville < > [email protected]> wrote: > > > Hi, > > > > I'm playing around with the hadoop config to optimize the resources of > our > > cluster. I'm noticing that the cpu usage is sub-optimal. All the machines > > in > > the cluster have 1 quad core cpu. I looked at our > > mapred.tasktracker.map.tasks.maximum > > and mapred.tasktracker.reduce.tasks.maximum settings and the max map > tasks > > is set to 2 and the max reduce tasks is set to 1, keeping 1 cpu for > running > > the database (Cassandra) and the OS. > > > > My question is: why separating the settings for the map tasks and reduce > > tasks? I feel like what I want is to set > > mapred.tasktracker.tasks.maximum=3, > > so that all the cpus are always available for both map and reduce tasks. > > > > Am I missing something? > > > > Thanks, > > Sebastien > > > > That suggestion makes sense. As you run more concurrent jobs you may find > that having dedicated slots for reduce tasks is useful. You would not want > a > cluster running 600 mappers and 0 reducers :) >
