Hi to all, first many thanks for the quality of the work you are doing : thanks a lot
I am facing a bug with the memory management at shuffle time, I regularly get Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1612) reading the code in org.apache.hadoop.mapred.ReduceTask.java file the "ShuffleRamManager" is limiting the maximum of RAM allocation to Integer.MAX_VALUE * maxInMemCopyUse ? maxSize = (int)(conf.getInt("mapred.job.reduce.total.mem.bytes", (int)Math.min(Runtime.getRuntime().maxMemory(), Integer.MAX_VALUE)) * maxInMemCopyUse); Why is is so ? And why is it concatened to an Integer as its raw type is long ? Does it mean that you can not have a Reduce Task taking advantage of more than 2Gb of memory ? To explain a little bit my use case, I am processing some 2700 maps (each working on 128 MB block of data), and when the reduce phase starts, it sometimes stumbles with java heap memory issues. configuration is : java 1.6.0-27 hadoop 0.20.2 -Xmx1400m io.sort.mb 400 io.sort.factor 25 io.sort.spill.percent 0.80 mapred.job.shuffle.input.buffer.percent 0.70 ShuffleRamManager: MemoryLimit=913466944, MaxSingleShuffleLimit=228366736 I will decrease mapred.job.shuffle.input.buffer.percent to limit the errors, but I am not fully confident for the scalability of the process. Any help would be welcomed once again, many thanks Olivier P.S: sorry if I misunderstood the code, any explanation would be really welcomed --