See pig.
This one: http://research.yahoo.com/project/pig Not this one: http://en.wikipedia.org/wiki/Pig On 9/13/07 10:45 AM, "Ashish Thusoo" <[EMAIL PROTECTED]> wrote: > On a related note - has anyone seen proposals or ideas for languages on > top of hadoop map/reduce (could even be languages for some sort of code > generators) to make writing the joins easy. It is quite a nightmare to > write these joins especially when it involves multiple data sources. We > are thinking of doing something similar. I wanted to find out if someone > else has some ideas to share. > > Thanks, > Ashish > > -----Original Message----- > From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 13, 2007 7:43 AM > To: [email protected] > Subject: RE: JOIN-type operations with Hadoop... > > We use the directory namespace to distinguish different types of files. > Wrote a simple wrapper around TextInputFormat/SequenceFileInputFormat - > such that they key returned is the pathname (or some component of the > pathname). That way u can look at the key - and then decide what kind of > record structure the value encodes and take the proper action. > > Ping me if u want an example and will be happy to share. > > > -----Original Message----- > From: C G [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 13, 2007 7:11 AM > To: [email protected] > Subject: JOIN-type operations with Hadoop... > > Consider two row based files. The first has fields: > > A B C > > the second has fields: > > B D E > > I want to join these files on the key B, to create records of the > form: > > A B C D E > > So B can be thought of as a primary key, and the second file will only > distinct values of B...i.e. no repeats. > > I'm trying to reason through how to do this type of join operation in > Hadoop but am unsure how to proceed with different "types" of files. > > Does the community have any wisdom to share? > > Thanks, > C G > > > --------------------------------- > Yahoo! oneSearch: Finally, mobile search that gives answers, not web > links.
