RE: TOP N items

Aaron Baff Fri, 10 Sep 2010 16:08:05 -0700

I'm still fairly new at MapReduce, but here's my thoughts the solution.

Use the Item as the Key, the Count as the Value, in the Reducer, sum up all of 
the Count's and output the Item,sum(Count). To make it more efficient, use the 
same Reducer as the Combiner.

Then do a 2nd Job where you map the Count as the Key, and Item as the Value, 
use 1 Reducer, and Identity Reduce it (e.g. don't do any reducing, just output 
the Count,Item).

Aaron Baff | Developer | Telescope, Inc.

email:  [email protected] | office:  424 270 2913 | www.telescope.tv

Bored with summer reruns?  Spice up your TV week by watching and voting for 
your favorite act on America's Got Talent, 9pm ET/CT Tuesday nights on NBC.

The information contained in this email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. Any views expressed in this 
message are those of the individual and may not necessarily reflect the views 
of Telescope Inc. or its associated companies.

-----Original Message-----
From: Neil Ghosh [mailto:[email protected]]
Sent: Friday, September 10, 2010 3:51 PM
To: James Seigel
Cc: [email protected]
Subject: Re: TOP N items

Thanks James,

This gives me only N results for sure but not necessarily the top N

I have used the Item as Key and Count as Value as input to the reducer.

and my reducing logic is to sum the count for a particular item.

Now my output comes as grouped but not in order.

Do I need to use custom comparator ?

Thanks
Neil

On Sat, Sep 11, 2010 at 2:41 AM, James Seigel <[email protected]> wrote:

> Welcome to the land of the fuzzy elephant!
>
> Of course there are many ways to do it.  Here is one, it might not be
> brilliant or the right was, but I am sure you will get more :)
>
> Use the identity mapper...
>
>         job.setMapperClass(Mapper.class);
>
> then have one reducer....
>
>         job.setNumReduceTasks(1);
>
> then have a reducer that has something like this around your reducing
> code...
>
>         Counter counter = context.getCounter("ME", "total output records"
> );
>         if (counter.getValue() < LIMIT) {
>
>      <do your reducey stuff here>
>
>             context.write(key, value);
>             counter.increment(1);
>         }
>
>
> Cheers
> James.
>
>
>
> On 2010-09-10, at 3:04 PM, Neil Ghosh wrote:
>
> Hello ,
>
> I am new to Hadoop.Can anybody suggest any example or procedure of
> outputting TOP N items having maximum total count, where the input file has
> have (Item, count ) pair  in each line .
>
> Items can repeat.
>
> Thanks
> Neil
> http://neilghosh.com
>
> --
> Thanks and Regards
> Neil
> http://neilghosh.com
>
>
>

--
Thanks and Regards
Neil
http://neilghosh.com

RE: TOP N items

Reply via email to