A = load 'yourdata';
B = group A by user;
C = foreach B {
      C1 = order A by date;
      C2 = limit C1 1;
      generate flatten(C2);
}
D = filter C by date > '2009'; -- assuming your date structure is something like '20100920'

I think this will give you all users who made their first purchase in 2010.

Alan.

On Sep 21, 2010, at 3:21 AM, Christian Decker wrote:

Hi all,

once again I can't wrap my head around how to approach a problem in Pig. I'm trying to count a number of elements in a timespan if they are the first that match a criterion. So let's say I have tuples with a date, a user and a purchase, and now I want to count the users that made their first purchase
in the year 2010 (my timespan), but it's this "first purchase" which
troubles me, because usually I'd filter by date, then aggregate purchases by users and then count the resulting rows. As it is right now I'd have to to
the above and then repeat the steps for the timespan before and then
subtract this set to the resulting set.
Isn't there an easier way?

Regards,
Chris

Reply via email to