Re: TOP N items

Neil Ghosh Fri, 10 Sep 2010 17:15:55 -0700

Thanks Aaron. I employed two Jobs and solved the problem.

I was just wondering is there anyway , it can be done in single  job so that
disk/network I/O is less and no temporary storage is required between 1st
and second job.


Neil

On Sat, Sep 11, 2010 at 4:37 AM, Aaron Baff <[email protected]> wrote:

> I'm still fairly new at MapReduce, but here's my thoughts the solution.
>
> Use the Item as the Key, the Count as the Value, in the Reducer, sum up all
> of the Count's and output the Item,sum(Count). To make it more efficient,
> use the same Reducer as the Combiner.
>
> Then do a 2nd Job where you map the Count as the Key, and Item as the
> Value, use 1 Reducer, and Identity Reduce it (e.g. don't do any reducing,
> just output the Count,Item).
>
> Aaron Baff | Developer | Telescope, Inc.
>
> email:  [email protected] | office:  424 270 2913 | www.telescope.tv
>
> Bored with summer reruns?  Spice up your TV week by watching and voting for
> your favorite act on America's Got Talent, 9pm ET/CT Tuesday nights on NBC.
>
> The information contained in this email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this email by
> anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be taken
> in reliance on it, is prohibited and may be unlawful. Any views expressed in
> this message are those of the individual and may not necessarily reflect the
> views of Telescope Inc. or its associated companies.
>
>
> -----Original Message-----
> From: Neil Ghosh [mailto:[email protected]]
> Sent: Friday, September 10, 2010 3:51 PM
> To: James Seigel
> Cc: [email protected]
> Subject: Re: TOP N items
>
> Thanks James,
>
> This gives me only N results for sure but not necessarily the top N
>
> I have used the Item as Key and Count as Value as input to the reducer.
>
> and my reducing logic is to sum the count for a particular item.
>
> Now my output comes as grouped but not in order.
>
> Do I need to use custom comparator ?
>
> Thanks
> Neil
>
> On Sat, Sep 11, 2010 at 2:41 AM, James Seigel <[email protected]> wrote:
>
> > Welcome to the land of the fuzzy elephant!
> >
> > Of course there are many ways to do it.  Here is one, it might not be
> > brilliant or the right was, but I am sure you will get more :)
> >
> > Use the identity mapper...
> >
> >         job.setMapperClass(Mapper.class);
> >
> > then have one reducer....
> >
> >         job.setNumReduceTasks(1);
> >
> > then have a reducer that has something like this around your reducing
> > code...
> >
> >         Counter counter = context.getCounter("ME", "total output records"
> > );
> >         if (counter.getValue() < LIMIT) {
> >
> >      <do your reducey stuff here>
> >
> >             context.write(key, value);
> >             counter.increment(1);
> >         }
> >
> >
> > Cheers
> > James.
> >
> >
> >
> > On 2010-09-10, at 3:04 PM, Neil Ghosh wrote:
> >
> > Hello ,
> >
> > I am new to Hadoop.Can anybody suggest any example or procedure of
> > outputting TOP N items having maximum total count, where the input file
> has
> > have (Item, count ) pair  in each line .
> >
> > Items can repeat.
> >
> > Thanks
> > Neil
> > http://neilghosh.com
> >
> > --
> > Thanks and Regards
> > Neil
> > http://neilghosh.com
> >
> >
> >
>
>
> --
> Thanks and Regards
> Neil
> http://neilghosh.com
>



-- 
Thanks and Regards
Neil
http://neilghosh.com

Re: TOP N items

Reply via email to