reduce stop after n records

Henry Helgen Thu, 08 Mar 2012 15:02:50 -0800

I am using hadoop 0.20.2 mapreduce API. The program is running fine, just
slower than it could.


I sum values and then use
job.setSortComparatorClass(LongWritable.DecreasingComparator.class) to sort
descending by sum. I need to stop the reducer after outputting the first N
records. This would save the reducer from running over thousands of records
when it only needs the first few records. Is there a solution with the new
mapreduce 0.20.2 API?

-------------------------------------------------------------------
I notice messages from 2008 about this topic:
http://grokbase.com/t/hadoop/common-user/089420wvkx/stop-mr-jobs-after-n-records-have-been-produced

https://issues.apache.org/jira/browse/HADOOP-3973

The last statement follows,  but the link is broken.
"You could do this pretty easily by implementing a custom MapRunnable.
There is no equivalent for reduces. The interface proposed in
HADOOP-1230 would support that kind of application. See:
http://svn.apache.org/repos/asf/hadoop/core/trunk/src/mapred/org/apache/
hadoop/mapreduce/
Look at the new Mapper and Reducer interfaces."

reduce stop after n records

Reply via email to