I'm not sure this sort of problem will be efficient in Hadoop, but its the kind of problem WaveFS[1] is designed for. It propagates intermediate values across the cluster, allowing for algorithms to run in parallel, but coalesce shared products from distributed calculations. Without the need to force all the values onto a single reducer task.
Darren [1] http://www.gridwavetech.com/blog/ On Sun, 2010-02-28 at 15:15 -0500, [email protected] wrote: > Hello Everybody, > I have a small question. I want to know how would one > implement > divide and conquer algorithms in Hadoop. For example suppose I want to > implement > merge sort 100 lines in hadoop. There will be 10 mapper each sorting 10 lines. > Now comes the tough part > > In the traditional version of merge sort each piece of 10 lines is combined to > form 5 pieces of 20 lines. The each piece of 20 lines is combined to form 3 > pieces of 40 lines and so on. I am unable to understand how to implement this > functionality in the reducer. > > Any help would be welcome > > PS Although the example I have given here is of Merge Sort, my actual problem > is > some thing else so I cannot used the algorithm of external merge sort. > > > Best Regards from Buffalo > > Abhishek Agrawal > > SUNY- Buffalo > (716-435-7122) > > >
