Pre-sort value list in reduce

pi song Mon, 14 Apr 2008 16:25:59 -0700

Dear people in Hadoop mailing list,

Is there any way to control the value list in reduce (Key, List of values)
to be sorted? or at least clusteringly sorted (containing clusters of sorted
values e.g. 1,1,1,2,2,2,2,3,3,3,  1,1,1,1,1,1,2,2,2,2,3
,1,1,2,2,2,3,3,3,3,3,3,3) ?
I had a look at JobConf.setOutputValueGroupingComparator in javadoc and I
think it might be the answer because I feel most of the time grouping in
Hadoop is done by sort. Am I right?


Can anyone help me? How about the performance impact of your solution?

Thanks in advance,
Pi

Pre-sort value list in reduce

Reply via email to