Ankur C. Goel
Tue, 09 Feb 2010 04:49:44 -0800
Felix, If I understand correctly what you need is to output a bag which has n-1 tuples if n > 1, where n is the original number of tupled in the group's bag. At the moment you will have to implement a custom UDF, pass in your bag and output a bag as per the criteria. Refer to http://wiki.apache.org/pig/UDFManual For help on implementing the UDF
Regards
-...@nkur
On 2/9/10 4:00 PM, "Gerrit Jansen van Vuuren" <gvanvuu...@specificmedia.com>
wrote:
Hi,
Hope this helps:
Example:
A = LOAD 'test.txt' as(p,q);
DUMP A;
(a 1)
(b 2)
(a 1)
(a 3)
(b 2)
(a 1)
B = GROUP A BY (p,q);
DUMP B;
((a 1,),{(a 1,),(a 1,),(a 1,)})
((a 3,),{(a 3,)})
((b 2,),{(b 2,),(b 2,)})
C = FOREACH B{
oneTuple = LIMIT A 1; // LIMIT produces only one output, you can
flatten afterwards to get the actual fields from the tuple.
GENERATE group.p, group.q, FLATTEN(oneTuple);
}
DUMP C;
(a 1,,a 1,)
(a 3,,a 3,)
(b 2,,b 2,)
Cheers,
Gerrit
-----Original Message-----
From: felix gao [mailto:gre1...@gmail.com]
Sent: Monday, February 08, 2010 7:44 PM
To: pig-user@hadoop.apache.org
Subject: Dropping one tuple from the bag purposefully
Ok, this might sound little weird.
my schema is f1, f2, f3 ,f4, f5, f6
when group by f1, f2, f3. I need to drop exactly one tuple when I have more
than one tuples by grouping f1,f2,f3. Also the values of the tuples in each
group could occur more than once.
Is there a way to do it in pig?
Thanks,
Felix