Hi, > I have a small question. I want to know how would one implement > divide and conquer algorithms in Hadoop. For example suppose I want to > implement > merge sort 100 lines in hadoop. There will be 10 mapper each sorting 10 lines. > Now comes the tough part > > In the traditional version of merge sort each piece of 10 lines is combined to > form 5 pieces of 20 lines. The each piece of 20 lines is combined to form 3 > pieces of 40 lines and so on. I am unable to understand how to implement this > functionality in the reducer. > > Any help would be welcome
You don't have to implement merge sort in reducer - it happens sort of automatically between mapper and reducer. All the data gets sorted due to grouping / partitioning needed to run reducers. If you really mean some more generic "divide and conquer" problem, then, I guess, Hadoop is not a best choice for parallelizing it. You might want to solve it using several sequential Hadoop jobs, but it's not very effective and it's pretty limited in terms of scalability. -- WBR, Mikhail Yakshin
