Thanks Aaron. I employed two Jobs and solved the problem. I was just wondering is there anyway , it can be done in single job so that disk/network I/O is less and no temporary storage is required between 1st and second job.
Neil On Sat, Sep 11, 2010 at 4:37 AM, Aaron Baff <[email protected]> wrote: > I'm still fairly new at MapReduce, but here's my thoughts the solution. > > Use the Item as the Key, the Count as the Value, in the Reducer, sum up all > of the Count's and output the Item,sum(Count). To make it more efficient, > use the same Reducer as the Combiner. > > Then do a 2nd Job where you map the Count as the Key, and Item as the > Value, use 1 Reducer, and Identity Reduce it (e.g. don't do any reducing, > just output the Count,Item). > > Aaron Baff | Developer | Telescope, Inc. > > email: [email protected] | office: 424 270 2913 | www.telescope.tv > > Bored with summer reruns? Spice up your TV week by watching and voting for > your favorite act on America's Got Talent, 9pm ET/CT Tuesday nights on NBC. > > The information contained in this email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this email by > anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be taken > in reliance on it, is prohibited and may be unlawful. Any views expressed in > this message are those of the individual and may not necessarily reflect the > views of Telescope Inc. or its associated companies. > > > -----Original Message----- > From: Neil Ghosh [mailto:[email protected]] > Sent: Friday, September 10, 2010 3:51 PM > To: James Seigel > Cc: [email protected] > Subject: Re: TOP N items > > Thanks James, > > This gives me only N results for sure but not necessarily the top N > > I have used the Item as Key and Count as Value as input to the reducer. > > and my reducing logic is to sum the count for a particular item. > > Now my output comes as grouped but not in order. > > Do I need to use custom comparator ? > > Thanks > Neil > > On Sat, Sep 11, 2010 at 2:41 AM, James Seigel <[email protected]> wrote: > > > Welcome to the land of the fuzzy elephant! > > > > Of course there are many ways to do it. Here is one, it might not be > > brilliant or the right was, but I am sure you will get more :) > > > > Use the identity mapper... > > > > job.setMapperClass(Mapper.class); > > > > then have one reducer.... > > > > job.setNumReduceTasks(1); > > > > then have a reducer that has something like this around your reducing > > code... > > > > Counter counter = context.getCounter("ME", "total output records" > > ); > > if (counter.getValue() < LIMIT) { > > > > <do your reducey stuff here> > > > > context.write(key, value); > > counter.increment(1); > > } > > > > > > Cheers > > James. > > > > > > > > On 2010-09-10, at 3:04 PM, Neil Ghosh wrote: > > > > Hello , > > > > I am new to Hadoop.Can anybody suggest any example or procedure of > > outputting TOP N items having maximum total count, where the input file > has > > have (Item, count ) pair in each line . > > > > Items can repeat. > > > > Thanks > > Neil > > http://neilghosh.com > > > > -- > > Thanks and Regards > > Neil > > http://neilghosh.com > > > > > > > > > -- > Thanks and Regards > Neil > http://neilghosh.com > -- Thanks and Regards Neil http://neilghosh.com
