Hi, I'm playing around with the hadoop config to optimize the resources of our cluster. I'm noticing that the cpu usage is sub-optimal. All the machines in the cluster have 1 quad core cpu. I looked at our mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum settings and the max map tasks is set to 2 and the max reduce tasks is set to 1, keeping 1 cpu for running the database (Cassandra) and the OS.
My question is: why separating the settings for the map tasks and reduce tasks? I feel like what I want is to set mapred.tasktracker.tasks.maximum=3, so that all the cpus are always available for both map and reduce tasks. Am I missing something? Thanks, Sebastien
