Re: TOP N items

James Seigel Fri, 10 Sep 2010 14:12:16 -0700

Welcome to the land of the fuzzy elephant!

Of course there are many ways to do it.  Here is one, it might not be brilliant 
or the right was, but I am sure you will get more :)


Use the identity mapper...

        job.setMapperClass(Mapper.class);

then have one reducer....

        job.setNumReduceTasks(1);

then have a reducer that has something like this around your reducing code...

        Counter counter = context.getCounter(“ME", "total output records");
        if (counter.getValue() < LIMIT) {

    <do your reducey stuff here>

            context.write(key, value);
            counter.increment(1);
        }


Cheers
James.



On 2010-09-10, at 3:04 PM, Neil Ghosh wrote:

Hello ,

I am new to Hadoop.Can anybody suggest any example or procedure of
outputting TOP N items having maximum total count, where the input file has
have (Item, count ) pair  in each line .

Items can repeat.

Thanks
Neil
http://neilghosh.com

--
Thanks and Regards
Neil
http://neilghosh.com

Re: TOP N items

Reply via email to