Since running with hadoop-0.18 we have many more problems with running out
of memory during the final merge process in the reduce phase, especially
when dealing with a lot of records with the same key.
Typical exception:
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:278)
at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:340)
at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:134)
at
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:2
25)
at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:242)
at
org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:720)
at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:679)
at
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.jav
a:227)
at
org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:60)
at
org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:36)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
This did not occur in earlier releases although we used a much larger fan
factor io.sort.factor (500+ versus currently just 100). Also tasks are run
with 2GB of heap space.
What changed in the merge algorithm between hadoop-0.17 and hadoop-0.18?
Are records with same key getting sorted by size for some reason? This would
cause large values to be merged at the same time.
Thanks,
Christian