About reducer's Shuffle JVM Heap Size

Lijie Xu Sat, 27 Oct 2012 07:35:47 -0700

Hi, all.
I'm debugging Hadoop's source code and find an incomprehensible setting.

In hadoop-0.20.2 and hadoop-1.0.3, reducer's shuffle buffer size cannotexceed 2048MB (i.e., Integer.MAX_VALUE). In this way, although*//*reducer's JVM size can be set more than 2048MB (e.g.,mapred.child.java.opts=-Xmx4000m), the heap size used for shuffle bufferis at most "2048MB * maxInMemCopyUse (default 0.7)" not "4000MB *maxInMemCopyUse".

I think it's not a reasonable setting for large memory machines.

The following code taken from "org.apache.hadoop.mapred.ReduceTask"shows the concrete algorithm.I'm wondering why maxSize is declared as "long" but set as "(int)". Ipoint out the corresponding code with "-->"


Thanks for any suggestion.
---------------------------------------------------------------------------------------------------------------------------
      private final long maxSize;
      private final long maxSingleShuffleLimit;

      private long size = 0;

      private Object dataAvailable = new Object();
      private long fullSize = 0;
      private int numPendingRequests = 0;
      private int numRequiredMapOutputs = 0;
      private int numClosed = 0;
      private boolean closed = false;

      public ShuffleRamManager(Configuration conf) throws IOException {
        final float maxInMemCopyUse =
conf.getFloat("mapred.job.shuffle.input.buffer.percent", 0.70f);
        if (maxInMemCopyUse > 1.0 || maxInMemCopyUse < 0.0) {
          throw new IOException("mapred.job.shuffle.input.buffer.percent" +
                                maxInMemCopyUse);
        }
        // Allow unit tests to fix Runtime memory
-->   maxSize = (int)(conf.getInt("mapred.job.reduce.total.mem.bytes",

--> (int)Math.min(Runtime.getRuntime().maxMemory(),Integer.MAX_VALUE))

-->      * maxInMemCopyUse);

maxSingleShuffleLimit = (long)(maxSize *MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);

        LOG.info("ShuffleRamManager: MemoryLimit=" + maxSize +
                 ", MaxSingleShuffleLimit=" + maxSingleShuffleLimit);
      }

About reducer's Shuffle JVM Heap Size

Reply via email to