Hi Mark Have a look at CompositeInputFormat. I guess it is what you are looking for to achieve map side joins. If you are fine with a Reduce side join go in with MultipleInputFormat. I have tried the same sort of joins using MultipleInputFormat and have scribbled something on the same. Check out if it'd be useful for you. (A very crude implementation :), you may have better ways ) http://kickstarthadoop.blogspot.com/2011/09/joins-with-plain-map-reduce.html
Hope it helps!... Regards Bejoy.K.S On Sun, Jan 15, 2012 at 4:34 PM, Mike Spreitzer <mspre...@us.ibm.com> wrote: > BTW, each key appears exactly once in the large constant dataset, and > exactly once in each MR job's output. > > I am thinking the right approach is to consistently partition the job > output and the large constant dataset, with the number of partitions being > the number of reduce tasks; each part goes into its own file. Make an > InputFormat whose number of splits equals the number of reduce tasks. > Reading a split will consist of reading a corresponding pair of files, > stepping through each. Seems like something that should already be > provided by something in org.apache.hadoop.mapreduce.*. > > Thanks, > Mike