More precise documentation for setOutputValueGroupingComparator
---------------------------------------------------------------
Key: MAPREDUCE-2148
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2148
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: task
Affects Versions: 0.20.2
Reporter: Jingguo Yao
The Javadoc of JobConf#setOutputValueGroupingComparator method explains the
usage of a comparator for grouping keys.
org.apache.hadoop.examples.SecondarySort uses such a comparator. In
SecondarySort, all the 2 parts of IntPair is used for key sorting. The first
part of IntPair is used for partition and grouping. When the first parts of
several IntPairs are equal to each other, it is very possible that these
IntPairs are not equal to each other. These IntPairs will be grouped in a
single invocation of reduce method since group comparator only use the first
part of IntPairs. However, reduce method only accepts a single key object. In
such kind of situations, the first IntPair is used as the key in reduce method.
I have checked the source code of Task.ValuesIterator whose logic is
consistent with the above behaviour.
I think that if such behavior of grouping comparator should be documented in
JobConf#setOutputValueGroupingComparator.
I am happy to provide a patch if some committer think that this is an issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.