Note that a "one reducer" isn't always the solution. If you know your key space boundaries, consider using a total-order-partition to scale the app/job and make use of nodes on the cluster.
On Sat, Feb 2, 2013 at 10:35 AM, praveenesh kumar <praveen...@gmail.com> wrote: > I am looking for a better solution for this. > > 1 way to do this would be to find top N values from each mappers and > then find out the top N out of them in 1 reducer. I am afraid that > this won't work effectively if my N is larger than number of values in > my inputsplit (or mapper input). > > Otherway is to just sort all of them in 1 reducer and then do the cat of > top-N. > > Wondering if there is any better approach to do this ? > > Regards > Praveenesh -- Harsh J