Re: Writing a simple sort application for Hadoop

Ed Mazur Sun, 28 Feb 2010 12:25:24 -0800

Hi Abhishek,

If you use input lines as your output keys in map, Hadoop internals
will do the work for you and the keys will appear in sorted order in
your reduce (you can use IdentityReducer). This needs a slight
adjustment if your input lines aren't unique.

If you have R reducers, this will create R sorted files. If you want a
single sorted file, you can merge the R files or use 1 reducer.
Another way is to use TotalOrderPartitioner which will ensure all keys
in reduce N come after all keys in reduce N-1.

Owen O'Malley and Arun C. Murthy's paper [1] about using Hadoop to win
a sorting competition might be of interest to you.

Ed

[1] http://sortbenchmark.org/Yahoo2009.pdf

On Sun, Feb 28, 2010 at 1:53 PM,  <[email protected]> wrote:
> Hello,
>      I am trying to write a simple sorting application for hadoop. This is 
> what
> I have thought till now. Suppose I have 100 lines of data and 10 mappers, 
> each of
> the 10 mappers will sort the data given to it. But I am unable to figure out 
> is
> how to join these outputs to one big sorted array. In other words what should 
> be
> the code to be written in the reduce ?
>
>
> Best Regards from Buffalo
>
> Abhishek Agrawal
>
> SUNY- Buffalo
> (716-435-7122)
>
>
>
>

Re: Writing a simple sort application for Hadoop

Reply via email to