I am using hadoop 0.20.2 mapreduce API. The program is running fine, just slower than it could.
I sum values and then use job.setSortComparatorClass(LongWritable.DecreasingComparator.class) to sort descending by sum. I need to stop the reducer after outputting the first N records. This would save the reducer from running over thousands of records when it only needs the first few records. Is there a solution with the new mapreduce 0.20.2 API? ------------------------------------------------------------------- I notice messages from 2008 about this topic: http://grokbase.com/t/hadoop/common-user/089420wvkx/stop-mr-jobs-after-n-records-have-been-produced https://issues.apache.org/jira/browse/HADOOP-3973 The last statement follows, but the link is broken. "You could do this pretty easily by implementing a custom MapRunnable. There is no equivalent for reduces. The interface proposed in HADOOP-1230 would support that kind of application. See: http://svn.apache.org/repos/asf/hadoop/core/trunk/src/mapred/org/apache/ hadoop/mapreduce/ Look at the new Mapper and Reducer interfaces."