Hello,
I'm currently developing a map/reduce program that emits a fair amount of maps
per input record (around 50 - 100), and I'm getting OutOfMemory errors:
2008-09-06 15:28:08,993 ERROR org.apache.hadoop.mapred.pipes.BinaryProtocol:
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$BlockingBuffer.reset(MapTask.java:564)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:440)
at
org.apache.hadoop.mapred.pipes.OutputHandler.output(OutputHandler.java:55)
at
org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(BinaryProtocol.java:117)
It is a reproducible error which occurs at the same percentage all the time -
when I emit less maps per input record, the problem goes away.
Now, I have tried editing conf/hadoop-env.sh to increase the HADOOP_HEAPSIZE to
2000MB and set `export HADOOP_TASKTRACKER_OPTS="-Xms32m -Xmx2048m"`, but the
problem persists at the exact same place.
Now, my use case doesn't really look that spectacular; is this a common
problem, and if so, what are the usual ways to get around this?
Thanks in advance for a response!
Regards,
Leon Mergen