Hi Abhishek, If you use input lines as your output keys in map, Hadoop internals will do the work for you and the keys will appear in sorted order in your reduce (you can use IdentityReducer). This needs a slight adjustment if your input lines aren't unique.
If you have R reducers, this will create R sorted files. If you want a single sorted file, you can merge the R files or use 1 reducer. Another way is to use TotalOrderPartitioner which will ensure all keys in reduce N come after all keys in reduce N-1. Owen O'Malley and Arun C. Murthy's paper [1] about using Hadoop to win a sorting competition might be of interest to you. Ed [1] http://sortbenchmark.org/Yahoo2009.pdf On Sun, Feb 28, 2010 at 1:53 PM, <[email protected]> wrote: > Hello, > I am trying to write a simple sorting application for hadoop. This is > what > I have thought till now. Suppose I have 100 lines of data and 10 mappers, > each of > the 10 mappers will sort the data given to it. But I am unable to figure out > is > how to join these outputs to one big sorted array. In other words what should > be > the code to be written in the reduce ? > > > Best Regards from Buffalo > > Abhishek Agrawal > > SUNY- Buffalo > (716-435-7122) > > > >
