Vasilis, > I 'd like to pass different JVM options for map tasks and different > ones for reduce tasks. I think it should be straightforward to add > mapred.mapchild.java.opts, mapred.reducechild.java.opts to my > conf/mapred-site.xml and process the new options accordingly in > src/mapred/org/apache/mapreduce/TaskRunner.java . Let me know if you > think it's more involved than what I described.
In trunk, (I haven't checked in earlier versions), there are already options such as mapreduce.map.java.opts and mapreduce.reduce.java.opts. Strangely, these are not documented in mapred-default.xml, though the option mapred.child.java.opts is deprecated in favor of the other two options. Please refer to MAPREDUCE-478 for details. > > My question is: if mapred.job.reuse.jvm.num.tasks is set to -1 (always > reuse), can the same JVM be re-used for different types of tasks? So > the same JVM being used e.g. first by a map task and then used by > reduce task. I am assuming this is definitely possible, though I > haven't verified in the code. Nope. JVMs are not reused across types. o.a.h.mapred.JvmManager has the relevant information. There's a JvmManagerForType inner class to which all reuse related calls are delegated and that is per type. In particular, launchJVM which is the basic method that triggers a reuse or spawns a new JVM, operates based on the task type. > So , if one wants to pass different jvm options to map tasks and > reduce tasks, perhaps jobs.reuse.jvm.num.task should be set to 1 > (never reuse) ? > Given the above, this is not necessary. You can reuse JVMs and pass separate parameters to the respective task types.