Hi,

There are  2 different data types in Pig

i) Tuple: a collection of fields, like a database record
ii) Bag: collection of tuples, like a database table.

In,
t1 = load table1 as id, listOfId;

If listOfId is a bag, flattening will give you
<1, 2>
<1, 3>
<1, 4>

If listOfId is a tuple, flattening will only remove the tuple wrapping and you will get
< 1, 2, 3, 4>

Assuming that listOfId is a bag, the following pig script is what you want

t1 = load table1 as id, listOfId;
<1, {2,3,4}>
t2 = load table2 as joinId, f1;
<2, a> < 3, b> <4, c>
t3 = foreach t1 generate id, flatten(listOfId);
<1, 2> <1, 3> <1, 4>
t4 = join t3 by $1, t2 by joinId;
< 1, 2, 2, a> < 1, 3, 3, b> <1, 4, 4, c>
t5 = foreach t4 generate id, f1;
<1,a> <1, b> <1, c>
t6 = group t5 by id;
<1, {a, b, c}>

t6 contains your result.

Utkarsh



On Aug 28, 2007, at 5:58 PM, Joydeep Sen Sarma wrote:


I am misunderstanding something.

following intro to pig-latin doc (p6), the flatten generating 'a' would
generate <1,2,3,4> (and not <1,2>,<1,3>,<1,4>)


-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 28, 2007 12:47 PM
To: [email protected]
Cc: [EMAIL PROTECTED]
Subject: Re: looking for some help with pig syntax

Sorry, I misunderstood what you were trying to generate.  Perhaps the
following will come closer:

t1 = load table1 as id, listOfId; -- <1, <2,3,4>>
t2 = load table2 as id, f1; -- <2,a>,<3,b>,<4,c>
a = foreach t1 generate id, flatten(listOfId); -- <1,2>,<1,3>,<1,4>
b = join a by $0, t2 by id; -- <2,1,2,2,a>,<3,1,3,3,b>,<4,1,4,4,c>
c = group b by $1; -- <1,{<2,1,2,2,a>,<3,1,3,3,b>,<4,1,4,4,c>}>
d = foreach d generate group, c.b::$4; -- <1, {<a>,<b>,<c>}>

where <> represents a tuple and {} a bag.

I'm not 100% sure of the syntax c.b::$4 for d, you may have to fiddle
with that to get it right.

Alan.




Joydeep Sen Sarma wrote:
Will it?

Trying an example:

t1 = {<1, <2, 3, 4>>}
t2 = {<2, "alpha">,<3,"beta">,<4,"gamma">}

desired outcome c = {<1, <"alpha", "beta", "gamma">} /* or
alternatively
*/
                c = {<1, <<2,"alpha">,<3,"beta">,<4,"gamma">>>}

but as proposed (I hope I am reading the pig document correctly):

t1a = {<2,3,4>}
b = {<2, 2, "alpha">}

// no point going further - this doesn't seem to be doing what I want
..


-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 28, 2007 10:45 AM
To: [email protected]
Cc: [EMAIL PROTECTED]
Subject: Re: looking for some help with pig syntax

I think the following will do what you want.

t1 = load table1 as id, listOfId;
t2 = load table2 as id, f1;
t1a = foreach t1 generate flatten(listOfId); -- flattens the lisOfId
into a set of ids
b = join t1a by $0, t2 by id; -- join the two together.
c = foreach b generate t2.id, t2.f1; -- project just the ids and f1
entries.

Alan.

Joydeep Sen Sarma wrote:

Specifically, how can we express this query:



Table1 contains: id, (list of ids)

Table2 contains: id, f1



Where the Table1:list is a variable length list of foreign key (id)

into

Table2.



We would like to join every element of Table1:list with corresponding
Table2:id. Ie. The final output should of the form:



Table3 contains: id, (list of f1)



Couldn't quite figure out how to do this - does Pig Latin support

nested

foreach loops? If there's a more appropriate mailing list - please
re-direct,



Thanks,



Joydeep









Reply via email to