How are records with equal key sorted in hadoop-0.18?

Christian Kunz Sun, 07 Dec 2008 23:30:18 -0800

Since running with hadoop-0.18 we have many more problems with running out
of memory during the final merge process in the reduce phase, especially
when dealing with a lot of records with the same key.


Typical exception:
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:278)
    at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:340)
    at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:134)
    at 
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:2
25)
    at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:242)
    at 
org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:720)
    at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:679)
    at 
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.jav
a:227)
    at 
org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:60)
    at 
org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:36)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
    at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

This did not occur in earlier releases although we used a much larger fan
factor io.sort.factor (500+ versus currently just 100). Also tasks are run
with 2GB of heap space.

What changed in the merge algorithm between hadoop-0.17 and hadoop-0.18?

Are records with same key getting sorted by size for some reason? This would
cause large values to be merged at the same time.

Thanks,
Christian

How are records with equal key sorted in hadoop-0.18?

Reply via email to