I'm still fairly new at MapReduce, but here's my thoughts the solution. Use the Item as the Key, the Count as the Value, in the Reducer, sum up all of the Count's and output the Item,sum(Count). To make it more efficient, use the same Reducer as the Combiner.
Then do a 2nd Job where you map the Count as the Key, and Item as the Value, use 1 Reducer, and Identity Reduce it (e.g. don't do any reducing, just output the Count,Item). Aaron Baff | Developer | Telescope, Inc. email: [email protected] | office: 424 270 2913 | www.telescope.tv Bored with summer reruns? Spice up your TV week by watching and voting for your favorite act on America's Got Talent, 9pm ET/CT Tuesday nights on NBC. The information contained in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any views expressed in this message are those of the individual and may not necessarily reflect the views of Telescope Inc. or its associated companies. -----Original Message----- From: Neil Ghosh [mailto:[email protected]] Sent: Friday, September 10, 2010 3:51 PM To: James Seigel Cc: [email protected] Subject: Re: TOP N items Thanks James, This gives me only N results for sure but not necessarily the top N I have used the Item as Key and Count as Value as input to the reducer. and my reducing logic is to sum the count for a particular item. Now my output comes as grouped but not in order. Do I need to use custom comparator ? Thanks Neil On Sat, Sep 11, 2010 at 2:41 AM, James Seigel <[email protected]> wrote: > Welcome to the land of the fuzzy elephant! > > Of course there are many ways to do it. Here is one, it might not be > brilliant or the right was, but I am sure you will get more :) > > Use the identity mapper... > > job.setMapperClass(Mapper.class); > > then have one reducer.... > > job.setNumReduceTasks(1); > > then have a reducer that has something like this around your reducing > code... > > Counter counter = context.getCounter("ME", "total output records" > ); > if (counter.getValue() < LIMIT) { > > <do your reducey stuff here> > > context.write(key, value); > counter.increment(1); > } > > > Cheers > James. > > > > On 2010-09-10, at 3:04 PM, Neil Ghosh wrote: > > Hello , > > I am new to Hadoop.Can anybody suggest any example or procedure of > outputting TOP N items having maximum total count, where the input file has > have (Item, count ) pair in each line . > > Items can repeat. > > Thanks > Neil > http://neilghosh.com > > -- > Thanks and Regards > Neil > http://neilghosh.com > > > -- Thanks and Regards Neil http://neilghosh.com
