Hi Arpit, A point to mention from http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/:
If each task takes less than 30-40 seconds, reduce the number of tasks. The task setup and scheduling overhead is a few seconds, so if tasks finish very quickly, you’re wasting time while not doing work. JVM reuse can also be enabled to solve this problem. Further I can think if we create a huge tree in the mapper phase in a Child JVM(lets say implementation needs a huge tree to be created), same can be re-used across the JVMs rather than creating again and again. Cheers, Subroto Sanyal On Jun 4, 2012, at 2:12 PM, Arpit Wanchoo wrote: > Hi > > I wanted to check what exactly we gain when JVM reusability is enabled in > mapped job. > > My doubt was regarding the setup() method of mapper. Is it called for a > mapper even if it is using the JVM for previously run mapper ? > If yes then is there any way I can control it or stop from being called more > than once. > > Regards, > Arpit Wanchoo | Sr. Software Engineer > Guavus Network Systems. > 6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, > Gurgaon,Haryana. > Mobile Number +91-9899949788 >