Re: Hadoop overhead

Ted Dunning Tue, 15 Jan 2008 09:22:40 -0800

Why so many mappers and reducers relative to the number of machines you
have?  This just causes excess heartache when running the job.

My standard practice is to run with a small factor larger than the number of
cores that I have (for instance 3 tasks on a 2 core machine).  In fact, I
find it most helpful to have the cluster defaults rule the choice except in
a few cases where I want one reducer or a few more than the standard 4
reducers.

On 1/15/08 9:15 AM, "Johan Oskarsson" <[EMAIL PROTECTED]> wrote:

> Hi.
> 
> I believe someone posted about this a while back, but it's worth
> mentioning again.
> 
> I just ran a job on our 10 node cluster where the input data was
> ~70 empty sequence files, with our default settings this ran about ~200
> mappers and ~70 reducers.
> 
> The job took almost exactly two minutes to finish.
> 
> How can we reduce this overhead?
> 
> * Pick number of mappers and reducers in a more dynamic way,
>    depending on the size of the input?
> * JVM reuse, one jvm per job instead of one per task?
> 
> Any other ideas?
> 
> /Johan

Re: Hadoop overhead

Reply via email to