RE: JOIN-type operations with Hadoop...

Ashish Thusoo Mon, 17 Sep 2007 11:39:40 -0700

Thanks for the pointer. 

We did take a look at pig and did find that it some of the constructs
that we have been talking about. How stable is the pig software? Has
anyone on this list used it?


Thanks,
Ashish

-----Original Message-----
From: Ted Dunning [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 13, 2007 11:10 AM
To: [email protected]
Subject: Re: JOIN-type operations with Hadoop...



See pig.

This one:  http://research.yahoo.com/project/pig

Not this one: http://en.wikipedia.org/wiki/Pig

On 9/13/07 10:45 AM, "Ashish Thusoo" <[EMAIL PROTECTED]> wrote:

> On a related note - has anyone seen proposals or ideas for languages
on
> top of hadoop map/reduce (could even be languages for some sort of
code
> generators) to make writing the joins easy. It is quite a nightmare to
> write these joins especially when it involves multiple data sources.
We
> are thinking of doing something similar. I wanted to find out if
someone
> else has some ideas to share.
> 
> Thanks,
> Ashish
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
> Sent: Thursday, September 13, 2007 7:43 AM
> To: [email protected]
> Subject: RE: JOIN-type operations with Hadoop...
> 
> We use the directory namespace to distinguish different types of
files.
> Wrote a simple wrapper around TextInputFormat/SequenceFileInputFormat
-
> such that they key returned is the pathname (or some component of the
> pathname). That way u can look at the key - and then decide what kind
of
> record structure the value encodes and take the proper action.
> 
> Ping me if u want an example and will be happy to share.
> 
> 
> -----Original Message-----
> From: C G [mailto:[EMAIL PROTECTED]
> Sent: Thursday, September 13, 2007 7:11 AM
> To: [email protected]
> Subject: JOIN-type operations with Hadoop...
> 
> Consider two row based files.  The first has fields:
>    
>       A B C
>    
>   the second has fields:
>    
>      B D E 
>    
>   I want to join these files on the key B, to create records of the
> form:
>    
>     A B C D E
>    
>   So B can be thought of as a primary key, and the second file will
only
> distinct values of B...i.e. no repeats.
>    
>   I'm trying to reason through how to do this type of join operation
in
> Hadoop but am unsure how to proceed with different "types" of files.
>    
>   Does the community have any wisdom to share?
>    
>   Thanks,
>   C G
> 
>        
> ---------------------------------
> Yahoo! oneSearch: Finally,  mobile search that gives answers, not web
> links.

RE: JOIN-type operations with Hadoop...

Reply via email to