This is a question about straight mapreduce so I'm cross-sending the answer.
To get any parallelization, you have to start multiple JVMs in the current hadoop version. Let's say you have configured your servers with mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum to 10 then it will start 20 JVMs when you launch a job. If there's no reuse, new JVMs are started for new map/reduce. If you do reuse, it won't start new JVMs (depends on your exact configs VS the job). J-D On Thu, Jan 21, 2010 at 1:16 PM, Sriram Muthuswamy Chittathoor <srir...@ivycomptech.com> wrote: > I noticed one thing during my sample mapreduce job running -- it creates a > lot of java processes on the slave nodes. Even when I have "reuse.tasks" > property set why does it not use a single jvm. Sometime I see almost like 20 > jvms running in a single box. What property can I use to reduce it from > spawning these huge number of jvm's >