Lance - Fun to see you on a mailing list. How are things?
;;peter On 08/18/10 22:11, Lance Norskog wrote: > Hadoop has a toolkit called 'map-side joins' which requires sorted > input tables. org.apache.hadoop.examples.Join.java shows how. Good > luck decoding it! > > Could you use chained mapper tasks to sort each input set before using > the join framework? > > On Wed, Aug 18, 2010 at 10:10 AM, y l <[email protected]> wrote: >> Hi, >> >> My first email on the list, and overall pretty new to Hadoop, so I'm hoping >> to find some help with a new task I have to do for work. >> I need to do a join between 2 sets of files. One is a bunch of csv files and >> the other set is sequence files. >> >> I was told MultiFilterRecorderReader could help me do the join, but I >> haven't been successful to find some good example on where and how to use >> that class to do the join. >> I have found a good example using CompositeInputFormat here: >> http://www.congiu.com/node/5 >> But it assumes that the input is sorted and I can't guarantee that it will >> be on the csv files at least. >> >> Anyone knows what I need to do with that MultiFilterRecorderReader? Inherit >> it on the mapper? I'm a little confused... Please let me know if you have >> any pointers on that one. >> >> Thanks. >> > > >
