On Fri, Jun 11, 2010 at 8:35 AM, Sébastien Rainville < [email protected]> wrote:
> Hi, > > I'm playing around with the hadoop config to optimize the resources of our > cluster. I'm noticing that the cpu usage is sub-optimal. All the machines > in > the cluster have 1 quad core cpu. I looked at our > mapred.tasktracker.map.tasks.maximum > and mapred.tasktracker.reduce.tasks.maximum settings and the max map tasks > is set to 2 and the max reduce tasks is set to 1, keeping 1 cpu for running > the database (Cassandra) and the OS. > > My question is: why separating the settings for the map tasks and reduce > tasks? I feel like what I want is to set > mapred.tasktracker.tasks.maximum=3, > so that all the cpus are always available for both map and reduce tasks. > > Am I missing something? > > Thanks, > Sebastien > That suggestion makes sense. As you run more concurrent jobs you may find that having dedicated slots for reduce tasks is useful. You would not want a cluster running 600 mappers and 0 reducers :)
