pig-user  

Re: piggybank apachelogparser.DateExtractor problem

David Vrensk
Wed, 17 Mar 2010 02:07:51 -0700

On Tue, Mar 16, 2010 at 19:58, Johannes Rußek <
johannes.rus...@io-consulting.net> wrote:

> Hello everybody,
> I've been trying to use
> org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor from
> piggybank that comes with pig 0.6.0, but i don't seem to be able to set the
> output format.
> whatever i use as the argument in the construct:
>
> DEFINE MyDateExtractor
> org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor('HH:mm:ss');
>
> i only ever get yyyy-MM-dd back.
> however, when i change DEFAULT_OUTGOING_DATE_FORMAT in
> main/java/org/apache/pig/piggybank/evaluation/util/apachelogparser/DateExtractor.java
> to something like 'yyyy-MM-dd-HH' it is able to output the right format.
> Am i doing something wrong?
>

I don't think so.  I ran into the same problem a couple of weeks ago, and
played around with the code inserting some print/log statements.  It turns
out that the arguments are only used in the initial constructor calls, when
the pig process is starting, but once pig reaches the point where it would
use the udf, it creates new DateExtractors without passing the arguments.

I found two ways around this:

1. Let the initial calls to the constructor store the format in a static
variable.  This is brittle.
2. Supply a date format with the actual calls.  This is what I ended up
doing (in my own DateExtractor that I created in my own UDF lib).  The end
result looks like this:

    public DateExtractor() {}

    @Override
    public String exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return null;

        DateFormat incomingDateFormat = defaultIncomingDateFormat;
        DateFormat outgoingDateFormat = defaultOutgoingDateFormat;
        if (input.size() > 1) {
            outgoingDateFormat = new SimpleDateFormat((String)input.get(1));
            outgoingDateFormat.setTimeZone(gmt);
        }
        if (input.size() > 2) {
            incomingDateFormat = new SimpleDateFormat((String)input.get(2));
            incomingDateFormat.setTimeZone(gmt);
        }

        String str="";
        try {
            str = (String)input.get(0);
            Date date = incomingDateFormat.parse(str);
            return outgoingDateFormat.format(date);

        } catch (ParseException pe) {
            System.err.println("releware.pig.evaluation.DateExtractor:
unable to parse date "+str);
            return null;
        } catch(Exception e){
            throw WrappedIOException.wrap("Caught exception processing input
row ", e);
        }
    }

and is used like this (hopefully—I can't find the script that used it):

DEFINE Xdate com.com.releware.pig.evaluation.DateExtractor;

A = *load log;*
B = foreach A generate Xdate(A.stupid_timestamp, 'MM-dd');

Hope this helps!

/David

-- 
David Vrensk
Systems developer, ICE House AB
Mobile: +46 703 74 69 00