Hi,

I'm writing a script to perform some analytics on a set of events occurring
in a set of apps.
I'm using Pig 0.11 and Hadoop 1.3.

Every event contains:

- d: date of the event
- aid: app id
- uid: user id

The aim of my script is to calculate for each application and for each day
in my log the number of unique users during the previous x days (in the
example code that is 2).

After trying various approaches with no result my current scripts looks
like:

________________________________________________________________

/**
 * describe events output:
 *
 * events: {d: chararray,aid: chararray,uid: chararray}
 */

eventDates = FOREACH events GENERATE d as targetDate;
dates      = DISTINCT eventDates;
crossed    = CROSS (GROUP events BY (aid)), dates;

/**
 * describe crossed output:
 *
 * crossed: {1-7::group: chararray,1-7::events: {(d: chararray,aid:
chararray,uid: chararray)},dates::targetDate: chararray}
 */

result = FOREACH crossed {
    date        = ToDate(targetDate, 'yyyy-MM-dd');
    filtered    = FILTER events BY DaysBetween(ToDate(d, 'yyyy-MM-dd'),
date) < 2
                                AND SecondsBetween(ToDate(d, 'yyyy-MM-dd'),
date) > 0;
    uniqueUsers = DISTINCT filtered.uid;
    GENERATE group as aid, targetDate as date, COUNT(uniqueUsers) as result;
}

describe result;
dump result;
________________________________________________________________

At this point I get the following error:

2013-12-19 05:20:17,283 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1025:
<file script.pig, line 46, column 25> Invalid field projection. Projected
field [targetDate] does not exist in schema:
d:bytearray,aid:chararray,uid:chararray.

Line 46 is equivalent to:

    date = ToDate(targetDate, 'yyyy-MM-dd');


But if I hardcode the date instead of reading it from the "crossed" bag:

   date = ToDate('2013-12-01', 'yyyy-MM-dd');

It actually works.

It looks like if I nest a foreach loop inside another foreach I'm not able
to project any more the first level fields.

Any idea about the reason of this? Or perhaps any better way to achieve the
same result?


Forgive any stupidity I may have written, this is my first approach to Pig
scripting! Any suggestion is highly appreciated.

Thanks and Regards,
Carlo

-- 
Carlo Di Fulco

Reply via email to