here is the Jira issue, and the beginning of a patch
https://issues.apache.org/jira/browse/MAPREDUCE-4866 there is indeed a limitation on the byte array size (around Integer.MAX_VALUE) Maybe we could use BigArrays to overcome this limitation ? What do you think ? regards Olivier Le 6 déc. 2012 à 19:41, Arun C Murthy a écrit : > Oliver, > > Sorry, missed this. > > The historical reason, if I remember right, is that we used to have a single > byte buffer and hence the limit. > > We should definitely remove it now since we don't use a single buffer. Mind > opening a jira? > > http://wiki.apache.org/hadoop/HowToContribute > > thanks! > Arun > > On Dec 6, 2012, at 8:01 AM, Olivier Varene - echo wrote: > >> anyone ? >> >> Début du message réexpédié : >> >>> De : Olivier Varene - echo <var...@echo.fr> >>> Objet : ReduceTask > ShuffleRamManager : Java Heap memory error >>> Date : 4 décembre 2012 09:34:06 HNEC >>> À : mapreduce-user@hadoop.apache.org >>> Répondre à : mapreduce-user@hadoop.apache.org >>> >>> >>> Hi to all, >>> first many thanks for the quality of the work you are doing : thanks a lot >>> >>> I am facing a bug with the memory management at shuffle time, I regularly >>> get >>> >>> Map output copy failure : java.lang.OutOfMemoryError: Java heap space >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1612) >>> >>> >>> reading the code in org.apache.hadoop.mapred.ReduceTask.java file >>> >>> the "ShuffleRamManager" is limiting the maximum of RAM allocation to >>> Integer.MAX_VALUE * maxInMemCopyUse ? >>> >>> maxSize = (int)(conf.getInt("mapred.job.reduce.total.mem.bytes", >>> (int)Math.min(Runtime.getRuntime().maxMemory(), >>> Integer.MAX_VALUE)) >>> * maxInMemCopyUse); >>> >>> Why is is so ? >>> And why is it concatened to an Integer as its raw type is long ? >>> >>> Does it mean that you can not have a Reduce Task taking advantage of more >>> than 2Gb of memory ? >>> >>> To explain a little bit my use case, >>> I am processing some 2700 maps (each working on 128 MB block of data), and >>> when the reduce phase starts, it sometimes stumbles with java heap memory >>> issues. >>> >>> configuration is : java 1.6.0-27 >>> hadoop 0.20.2 >>> -Xmx1400m >>> io.sort.mb 400 >>> io.sort.factor 25 >>> io.sort.spill.percent 0.80 >>> mapred.job.shuffle.input.buffer.percent 0.70 >>> ShuffleRamManager: MemoryLimit=913466944, MaxSingleShuffleLimit=228366736 >>> >>> I will decrease >>> mapred.job.shuffle.input.buffer.percent to limit the errors, but I am not >>> fully confident for the scalability of the process. >>> >>> Any help would be welcomed >>> >>> once again, many thanks >>> Olivier >>> >>> >>> P.S: sorry if I misunderstood the code, any explanation would be really >>> welcomed >>> >>> -- >>> >>> >>> >>> >>> >> > > -- > Arun C. Murthy > Hortonworks Inc. > http://hortonworks.com/ > >