A = load 'yourdata';
B = group A by user;
C = foreach B {
C1 = order A by date;
C2 = limit C1 1;
generate flatten(C2);
}
D = filter C by date > '2009'; -- assuming your date structure is
something like '20100920'
I think this will give you all users who made their first purchase in
2010.
Alan.
On Sep 21, 2010, at 3:21 AM, Christian Decker wrote:
Hi all,
once again I can't wrap my head around how to approach a problem in
Pig. I'm
trying to count a number of elements in a timespan if they are the
first
that match a criterion. So let's say I have tuples with a date, a
user and a
purchase, and now I want to count the users that made their first
purchase
in the year 2010 (my timespan), but it's this "first purchase" which
troubles me, because usually I'd filter by date, then aggregate
purchases by
users and then count the resulting rows. As it is right now I'd have
to to
the above and then repeat the steps for the timespan before and then
subtract this set to the resulting set.
Isn't there an easier way?
Regards,
Chris