Check out the data_join package under hadoop/contrib..
It offers a generic framework for doing various joining operations.


Runping


> -----Original Message-----
> From: C G [mailto:[EMAIL PROTECTED]
> Sent: Thursday, September 13, 2007 7:11 AM
> To: [email protected]
> Subject: JOIN-type operations with Hadoop...
> 
> Consider two row based files.  The first has fields:
> 
>       A B C
> 
>   the second has fields:
> 
>      B D E
> 
>   I want to join these files on the key B, to create records of the form:
> 
>     A B C D E
> 
>   So B can be thought of as a primary key, and the second file will only
> distinct values of B...i.e. no repeats.
> 
>   I'm trying to reason through how to do this type of join operation in
> Hadoop but am unsure how to proceed with different "types" of files.
> 
>   Does the community have any wisdom to share?
> 
>   Thanks,
>   C G
> 
> 
> ---------------------------------
> Yahoo! oneSearch: Finally,  mobile search that gives answers, not web
> links.

Reply via email to