Check out the data_join package under hadoop/contrib.. It offers a generic framework for doing various joining operations.
Runping > -----Original Message----- > From: C G [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 13, 2007 7:11 AM > To: [email protected] > Subject: JOIN-type operations with Hadoop... > > Consider two row based files. The first has fields: > > A B C > > the second has fields: > > B D E > > I want to join these files on the key B, to create records of the form: > > A B C D E > > So B can be thought of as a primary key, and the second file will only > distinct values of B...i.e. no repeats. > > I'm trying to reason through how to do this type of join operation in > Hadoop but am unsure how to proceed with different "types" of files. > > Does the community have any wisdom to share? > > Thanks, > C G > > > --------------------------------- > Yahoo! oneSearch: Finally, mobile search that gives answers, not web > links.
