Hi,
There are 2 different data types in Pig
i) Tuple: a collection of fields, like a database record
ii) Bag: collection of tuples, like a database table.
In,
t1 = load table1 as id, listOfId;
If listOfId is a bag, flattening will give you
<1, 2>
<1, 3>
<1, 4>
If listOfId is a tuple, flattening will only remove the tuple
wrapping and you will get
< 1, 2, 3, 4>
Assuming that listOfId is a bag, the following pig script is what you
want
t1 = load table1 as id, listOfId;
<1, {2,3,4}>
t2 = load table2 as joinId, f1;
<2, a> < 3, b> <4, c>
t3 = foreach t1 generate id, flatten(listOfId);
<1, 2> <1, 3> <1, 4>
t4 = join t3 by $1, t2 by joinId;
< 1, 2, 2, a> < 1, 3, 3, b> <1, 4, 4, c>
t5 = foreach t4 generate id, f1;
<1,a> <1, b> <1, c>
t6 = group t5 by id;
<1, {a, b, c}>
t6 contains your result.
Utkarsh
On Aug 28, 2007, at 5:58 PM, Joydeep Sen Sarma wrote:
I am misunderstanding something.
following intro to pig-latin doc (p6), the flatten generating 'a'
would
generate <1,2,3,4> (and not <1,2>,<1,3>,<1,4>)
-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 28, 2007 12:47 PM
To: [email protected]
Cc: [EMAIL PROTECTED]
Subject: Re: looking for some help with pig syntax
Sorry, I misunderstood what you were trying to generate. Perhaps the
following will come closer:
t1 = load table1 as id, listOfId; -- <1, <2,3,4>>
t2 = load table2 as id, f1; -- <2,a>,<3,b>,<4,c>
a = foreach t1 generate id, flatten(listOfId); -- <1,2>,<1,3>,<1,4>
b = join a by $0, t2 by id; -- <2,1,2,2,a>,<3,1,3,3,b>,<4,1,4,4,c>
c = group b by $1; -- <1,{<2,1,2,2,a>,<3,1,3,3,b>,<4,1,4,4,c>}>
d = foreach d generate group, c.b::$4; -- <1, {<a>,<b>,<c>}>
where <> represents a tuple and {} a bag.
I'm not 100% sure of the syntax c.b::$4 for d, you may have to fiddle
with that to get it right.
Alan.
Joydeep Sen Sarma wrote:
Will it?
Trying an example:
t1 = {<1, <2, 3, 4>>}
t2 = {<2, "alpha">,<3,"beta">,<4,"gamma">}
desired outcome c = {<1, <"alpha", "beta", "gamma">} /* or
alternatively
*/
c = {<1, <<2,"alpha">,<3,"beta">,<4,"gamma">>>}
but as proposed (I hope I am reading the pig document correctly):
t1a = {<2,3,4>}
b = {<2, 2, "alpha">}
// no point going further - this doesn't seem to be doing what I want
..
-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 28, 2007 10:45 AM
To: [email protected]
Cc: [EMAIL PROTECTED]
Subject: Re: looking for some help with pig syntax
I think the following will do what you want.
t1 = load table1 as id, listOfId;
t2 = load table2 as id, f1;
t1a = foreach t1 generate flatten(listOfId); -- flattens the lisOfId
into a set of ids
b = join t1a by $0, t2 by id; -- join the two together.
c = foreach b generate t2.id, t2.f1; -- project just the ids and f1
entries.
Alan.
Joydeep Sen Sarma wrote:
Specifically, how can we express this query:
Table1 contains: id, (list of ids)
Table2 contains: id, f1
Where the Table1:list is a variable length list of foreign key (id)
into
Table2.
We would like to join every element of Table1:list with
corresponding
Table2:id. Ie. The final output should of the form:
Table3 contains: id, (list of f1)
Couldn't quite figure out how to do this - does Pig Latin support
nested
foreach loops? If there's a more appropriate mailing list - please
re-direct,
Thanks,
Joydeep