Dear Todd,

I got the concept but I have no idea about side input in mapper class.
Can you guide me more on that?

Pankil

On Thu, Jul 9, 2009 at 1:39 PM, Todd Lipcon <[email protected]> wrote:

> Hi Pankil,
>
> Basically there are two steps here - the first is to sort the two files.
> This can be done using an mapreduce where the mapper extracts the join
> column as a key.
>
> If you make sure you have the same number of reducers (and partition by the
> equijoin column) for both sorts, then you'll end up with:
>
> A        B
> part-0  part-0
> part-1  part-1
>
> etc
>
> Each corresponding part file will be in sorted order, and you can perform
> the merge.
>
> To do the merge, you can just pick either A or B as your input for locality
> hints, and then, in the mapper, given the file name, determine the filename
> of the other partition. Open that up as a side input in your mapper and
> perform the merge like you would in a non-distributed setting.
>
> Hope this helps
> -Todd
>
>
> On Thu, Jul 9, 2009 at 9:09 AM, Pankil Doshi <[email protected]> wrote:
>
> > Hi,
> >
> > Does anyone has hint on how to implement "SORT-MERGE JOIN" using
> map-reduce
> > paradigm?
> > I read article regarding it on Pig wiki but did not got clarity as it
> > doesn't show in form of map and reduce.
> >
> > Pankil
> >
>

Reply via email to