Re: Hadoop: Divide and Conquer Algorithms

Mikhail Yakshin Sun, 28 Feb 2010 13:01:22 -0800

Hi,

>                I have a small question. I want to know how would one implement
> divide and conquer algorithms in Hadoop. For example suppose I want to 
> implement
> merge sort 100 lines in hadoop. There will be 10 mapper each sorting 10 lines.
> Now comes the tough part
>
> In the traditional version of merge sort each piece of 10 lines is combined to
> form 5 pieces of 20 lines. The each piece of 20 lines is combined to form 3
> pieces of 40 lines and so on. I am unable to understand how to implement this
> functionality in the reducer.
>
> Any help would be welcome


You don't have to implement merge sort in reducer - it happens sort of
automatically between mapper and reducer. All the data gets sorted due
to grouping / partitioning needed to run reducers.

If you really mean some more generic "divide and conquer" problem,
then, I guess, Hadoop is not a best choice for parallelizing it. You
might want to solve it using several sequential Hadoop jobs, but it's
not very effective and it's pretty limited in terms of scalability.

-- 
WBR, Mikhail Yakshin

Re: Hadoop: Divide and Conquer Algorithms

Reply via email to