Hi,
is there a reason why the io.sort.mb setting is hard-coded to the
maximum of 2047MB?

MapTask.java 789-791

      if ((sortmb & 0x7FF) != sortmb) {
        throw new IOException("Invalid \"io.sort.mb\": " + sortmb);
      }

Given that the EC2 High-Memory Quadruple Extra Large Instance has
68.4GB of memory and 8 cores, it would make sense to be able to set
the io.sort.mb to close to 8GB. I have map task that outputs
144,586,867 records of average size 12 bytes, and a greater than
2047MB sort buffer would allow me to prevent the inevitable spills. I
know I can reduce the size of the map inputs to solve the problem, but
2047MB seems a bit arbitrary given the spec of EC2 instances.

Cheers,
Donovan.

Reply via email to