Hi Arpit,

A point to mention from 
http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/:

If each task takes less than 30-40 seconds, reduce the number of tasks. The 
task setup and scheduling overhead is a few seconds, so if tasks finish very 
quickly, you’re wasting time while not doing work. JVM reuse can also be 
enabled to solve this problem.

Further I can think if we create a huge tree in the mapper phase in a Child 
JVM(lets say implementation needs a huge tree to be created), same can be 
re-used across the JVMs rather than creating again and again.

Cheers,
Subroto Sanyal

On Jun 4, 2012, at 2:12 PM, Arpit Wanchoo wrote:

> Hi
> 
> I wanted to check what exactly we gain  when JVM reusability is enabled in 
> mapped job.
> 
> My doubt was regarding the setup() method of mapper. Is it called for a 
> mapper even if it is using the JVM for previously run mapper ?
> If yes then is there any way I can control it or stop from being called more 
> than once.
> 
> Regards,
> Arpit Wanchoo | Sr. Software Engineer
> Guavus Network Systems.
> 6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, 
> Gurgaon,Haryana.
> Mobile Number +91-9899949788
> 

Reply via email to