Yeah. I think so. For a mapper, that is probably not significant as our map runs usually takes minutes. However, we also have it on for combiners (same as the reduce class), that becomes significant because a combiner's configure() run everytime for each key (quite a few in our case) in the end of every map task.
Zhu, Guojun Modeling Sr Graduate 571-3824370 guojun_...@freddiemac.com Financial Engineering Freddie Mac Arpit Wanchoo <arpit.wanc...@guavus.com> 06/05/2012 03:56 AM Please respond to mapreduce-user@hadoop.apache.org To "<mapreduce-user@hadoop.apache.org>" <mapreduce-user@hadoop.apache.org> cc Subject Re: JVM reuse in Map Tasks Yes I meant the configure(JobConf). I got that point. So that means, setup() is called for each mapper even if JVM reusability is enabled. If i understood correctly, then if I initialize a static variable (say var) in setup() and when mapper is started for the 2nd time on same JVM, the that var would be already initialized before setup() is called i.e it is retaining its value from previously run mapper. Is this the way ? Regards, Arpit Wanchoo | Sr. Software Engineer Guavus Network Systems. 6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana. Mobile Number +91-9899949788 On 04-Jun-2012, at 6:36 PM, GUOJUN Zhu wrote: For setup(), do you mean configure(JobConf)? We need to deserialize a big object and do some other preparing work on it within the configure() for setting up. It takes a few seconds and it is the same for all task. We just declare the object as static and do not recreate it if it is not null. By that way, we make sure only create it once and save the setup time for the rest of the tasks. Zhu, Guojun Modeling Sr Graduate 571-3824370 guojun_...@freddiemac.com Financial Engineering Freddie Mac Arpit Wanchoo <arpit.wanc...@guavus.com> 06/04/2012 08:12 AM Please respond to mapreduce-user@hadoop.apache.org To "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org> cc Subject JVM reuse in Map Tasks Hi I wanted to check what exactly we gain when JVM reusability is enabled in mapped job. My doubt was regarding the setup() method of mapper. Is it called for a mapper even if it is using the JVM for previously run mapper ? If yes then is there any way I can control it or stop from being called more than once. Regards, Arpit Wanchoo | Sr. Software Engineer Guavus Network Systems. 6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana. Mobile Number +91-9899949788