Re: TOP N items

Neil Ghosh Fri, 10 Sep 2010 15:51:19 -0700

Thanks James,

This gives me only N results for sure but not necessarily the top N


I have used the Item as Key and Count as Value as input to the reducer.

and my reducing logic is to sum the count for a particular item.

Now my output comes as grouped but not in order.

Do I need to use custom comparator ?

Thanks
Neil

On Sat, Sep 11, 2010 at 2:41 AM, James Seigel <[email protected]> wrote:

> Welcome to the land of the fuzzy elephant!
>
> Of course there are many ways to do it.  Here is one, it might not be
> brilliant or the right was, but I am sure you will get more :)
>
> Use the identity mapper...
>
>         job.setMapperClass(Mapper.class);
>
> then have one reducer....
>
>         job.setNumReduceTasks(1);
>
> then have a reducer that has something like this around your reducing
> code...
>
>         Counter counter = context.getCounter(“ME", "total output records"
> );
>         if (counter.getValue() < LIMIT) {
>
>      <do your reducey stuff here>
>
>             context.write(key, value);
>             counter.increment(1);
>         }
>
>
> Cheers
> James.
>
>
>
> On 2010-09-10, at 3:04 PM, Neil Ghosh wrote:
>
> Hello ,
>
> I am new to Hadoop.Can anybody suggest any example or procedure of
> outputting TOP N items having maximum total count, where the input file has
> have (Item, count ) pair  in each line .
>
> Items can repeat.
>
> Thanks
> Neil
> http://neilghosh.com
>
> --
> Thanks and Regards
> Neil
> http://neilghosh.com
>
>
>


-- 
Thanks and Regards
Neil
http://neilghosh.com

Re: TOP N items

Reply via email to