If a mapper of a map/reduce job with combiner has to spill the map output, the
performance degrades significantly
-----------------------------------------------------------------------------------------------------------------
Key: HADOOP-2940
URL: https://issues.apache.org/jira/browse/HADOOP-2940
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Affects Versions: 0.16.0
Reporter: Runping Qi
I have a map/reduce job whose reducers combine a group of values into a single
value.
The average reduction rate is about 3 to 1. The execution time for the job with
the reducer as its combiner ,
is twice of that for the case without using combiner. This is completely
counter-intuitive.
When I looked at the job execution more carefully, I noticed that this longer
execution time for the
job was mainly due to a few mappers that generated spills. The final merge of
the spills seems
took a much longer time with combiner than without combiner.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.