Why so many mappers and reducers relative to the number of machines you
have?  This just causes excess heartache when running the job.

My standard practice is to run with a small factor larger than the number of
cores that I have (for instance 3 tasks on a 2 core machine).  In fact, I
find it most helpful to have the cluster defaults rule the choice except in
a few cases where I want one reducer or a few more than the standard 4
reducers.


On 1/15/08 9:15 AM, "Johan Oskarsson" <[EMAIL PROTECTED]> wrote:

> Hi.
> 
> I believe someone posted about this a while back, but it's worth
> mentioning again.
> 
> I just ran a job on our 10 node cluster where the input data was
> ~70 empty sequence files, with our default settings this ran about ~200
> mappers and ~70 reducers.
> 
> The job took almost exactly two minutes to finish.
> 
> How can we reduce this overhead?
> 
> * Pick number of mappers and reducers in a more dynamic way,
>    depending on the size of the input?
> * JVM reuse, one jvm per job instead of one per task?
> 
> Any other ideas?
> 
> /Johan

Reply via email to