Hi, > I have a sort job consisting of only the Mapper (no Reducer) task. I want my > results to contain only the top n records. Is there any way of restricting > the number of records that are emitted by the Mappers? > > Basically I am looking to see if there is an equivalent of achieving > the behavior similar to LIMIT in SQL queries.
I think I understand your goal. However the question is toward (what I think) is the wrong solution. A mapper gets 1 record as input and only knows about that one record. There is no way to limit there. If you implement a simple reducer you can very easily let is stop reading the input iterator after N records and limit the output in that way. Doing it in the reducer also allows you to easily add a concept of "Top N" by using the "Secondary Sort" trick to sort the input before it arrives at the reducer. HTH Niels Basjes
