Dear people in Hadoop mailing list, Is there any way to control the value list in reduce (Key, List of values) to be sorted? or at least clusteringly sorted (containing clusters of sorted values e.g. 1,1,1,2,2,2,2,3,3,3, 1,1,1,1,1,1,2,2,2,2,3 ,1,1,2,2,2,3,3,3,3,3,3,3) ? I had a look at JobConf.setOutputValueGroupingComparator in javadoc and I think it might be the answer because I feel most of the time grouping in Hadoop is done by sort. Am I right?
Can anyone help me? How about the performance impact of your solution? Thanks in advance, Pi
