Hi Pankil, Simply use the normal FileSystem APIs to open the side input. You can construct a SequenceFile.Reader from a Path and use the normal methods inside that class to do the reading of the records.
-Todd On Thu, Jul 9, 2009 at 11:12 AM, Pankil Doshi <[email protected]> wrote: > Dear Todd, > > I got the concept but I have no idea about side input in mapper class. > Can you guide me more on that? > > Pankil > > On Thu, Jul 9, 2009 at 1:39 PM, Todd Lipcon <[email protected]> wrote: > > > Hi Pankil, > > > > Basically there are two steps here - the first is to sort the two files. > > This can be done using an mapreduce where the mapper extracts the join > > column as a key. > > > > If you make sure you have the same number of reducers (and partition by > the > > equijoin column) for both sorts, then you'll end up with: > > > > A B > > part-0 part-0 > > part-1 part-1 > > > > etc > > > > Each corresponding part file will be in sorted order, and you can perform > > the merge. > > > > To do the merge, you can just pick either A or B as your input for > locality > > hints, and then, in the mapper, given the file name, determine the > filename > > of the other partition. Open that up as a side input in your mapper and > > perform the merge like you would in a non-distributed setting. > > > > Hope this helps > > -Todd > > > > > > On Thu, Jul 9, 2009 at 9:09 AM, Pankil Doshi <[email protected]> > wrote: > > > > > Hi, > > > > > > Does anyone has hint on how to implement "SORT-MERGE JOIN" using > > map-reduce > > > paradigm? > > > I read article regarding it on Pig wiki but did not got clarity as it > > > doesn't show in form of map and reduce. > > > > > > Pankil > > > > > >
