Re: io.sort.mb maximum limit

ed Tue, 19 Oct 2010 06:51:51 -0700

HI Donovan,

This is sort of tangential to your question but we tried upping our
io.sort.mb to a really high value and it actually resulted in slower
performance (I think we bumped it up to 1400MB and this was slower than
leaving it at 256 on a machine with 32GB of RAM).  I'm not entirely sure why
this was the case.  It could have been a garbage collection issue or some
other secondary effect that was slowing things down.

Keep in mind that Hadoop will always spill map outputs to disk no matter how
large your sort buffer is in case the reducer crashes, the data needs to
exist on disk somewhere for the next reducer making the attempt so it might
be counterproductive to try and "eliminate" spills.

~Ed

On Tue, Oct 19, 2010 at 8:02 AM, Donovan Hide <[email protected]> wrote:

> Hi,
> is there a reason why the io.sort.mb setting is hard-coded to the
> maximum of 2047MB?
>
> MapTask.java 789-791
>
>       if ((sortmb & 0x7FF) != sortmb) {
>         throw new IOException("Invalid \"io.sort.mb\": " + sortmb);
>       }
>
> Given that the EC2 High-Memory Quadruple Extra Large Instance has
> 68.4GB of memory and 8 cores, it would make sense to be able to set
> the io.sort.mb to close to 8GB. I have map task that outputs
> 144,586,867 records of average size 12 bytes, and a greater than
> 2047MB sort buffer would allow me to prevent the inevitable spills. I
> know I can reduce the size of the map inputs to solve the problem, but
> 2047MB seems a bit arbitrary given the spec of EC2 instances.
>
> Cheers,
> Donovan.
>

Re: io.sort.mb maximum limit

Reply via email to