Hi Pankil,

Simply use the normal FileSystem APIs to open the side input. You can
construct a SequenceFile.Reader from a Path and use the normal methods
inside that class to do the reading of the records.

-Todd

On Thu, Jul 9, 2009 at 11:12 AM, Pankil Doshi <[email protected]> wrote:

> Dear Todd,
>
> I got the concept but I have no idea about side input in mapper class.
> Can you guide me more on that?
>
> Pankil
>
> On Thu, Jul 9, 2009 at 1:39 PM, Todd Lipcon <[email protected]> wrote:
>
> > Hi Pankil,
> >
> > Basically there are two steps here - the first is to sort the two files.
> > This can be done using an mapreduce where the mapper extracts the join
> > column as a key.
> >
> > If you make sure you have the same number of reducers (and partition by
> the
> > equijoin column) for both sorts, then you'll end up with:
> >
> > A        B
> > part-0  part-0
> > part-1  part-1
> >
> > etc
> >
> > Each corresponding part file will be in sorted order, and you can perform
> > the merge.
> >
> > To do the merge, you can just pick either A or B as your input for
> locality
> > hints, and then, in the mapper, given the file name, determine the
> filename
> > of the other partition. Open that up as a side input in your mapper and
> > perform the merge like you would in a non-distributed setting.
> >
> > Hope this helps
> > -Todd
> >
> >
> > On Thu, Jul 9, 2009 at 9:09 AM, Pankil Doshi <[email protected]>
> wrote:
> >
> > > Hi,
> > >
> > > Does anyone has hint on how to implement "SORT-MERGE JOIN" using
> > map-reduce
> > > paradigm?
> > > I read article regarding it on Pig wiki but did not got clarity as it
> > > doesn't show in form of map and reduce.
> > >
> > > Pankil
> > >
> >
>

Reply via email to