Thanks James, This gives me only N results for sure but not necessarily the top N
I have used the Item as Key and Count as Value as input to the reducer. and my reducing logic is to sum the count for a particular item. Now my output comes as grouped but not in order. Do I need to use custom comparator ? Thanks Neil On Sat, Sep 11, 2010 at 2:41 AM, James Seigel <[email protected]> wrote: > Welcome to the land of the fuzzy elephant! > > Of course there are many ways to do it. Here is one, it might not be > brilliant or the right was, but I am sure you will get more :) > > Use the identity mapper... > > job.setMapperClass(Mapper.class); > > then have one reducer.... > > job.setNumReduceTasks(1); > > then have a reducer that has something like this around your reducing > code... > > Counter counter = context.getCounter(“ME", "total output records" > ); > if (counter.getValue() < LIMIT) { > > <do your reducey stuff here> > > context.write(key, value); > counter.increment(1); > } > > > Cheers > James. > > > > On 2010-09-10, at 3:04 PM, Neil Ghosh wrote: > > Hello , > > I am new to Hadoop.Can anybody suggest any example or procedure of > outputting TOP N items having maximum total count, where the input file has > have (Item, count ) pair in each line . > > Items can repeat. > > Thanks > Neil > http://neilghosh.com > > -- > Thanks and Regards > Neil > http://neilghosh.com > > > -- Thanks and Regards Neil http://neilghosh.com
