pig-user  

Re: piggybank apachelogparser.DateExtractor problem

hc busy
Wed, 17 Mar 2010 10:29:50 -0700

ahh, I see. But see the problem is of course that the system is unable to
parse that (at least in 0.5) so what I ended up having to do is this:

 > temp_table = FOREACH table GENERATE *, '%Y-%m-%d' as format_string;
 > OUTPUT = FOREACH temp_table GENERATE *, my.udf.formatDate(format_string,
date_field);


The problem with doing it this way is that it generates an extra column with
the value '%Y-%m-%d' in it, then it calls UDF on the two columns. Slows
things down by a bit. I think it would be great if we could get constants to
parse, then David's original suggestion would work great:

 > OUTPUT = FOREACH table GENERATE *, my.udf.formatDate('%Y-%m-%d',
date_field);


On Wed, Mar 17, 2010 at 10:20 AM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> Umm yeah that's what David Vrensk said
>
> creating a new dateformat for every processed tuple is suboptimal, though.
>
> Can cache it, of course.
>
> -D
>
> On Wed, Mar 17, 2010 at 10:12 AM, hc busy <hc.b...@gmail.com> wrote:
>
> > There is another way to do this. I wrote a UDF that takes unix `date`
> > format
> > and converts date to a string
> >
> >
> > OUTPUT = FOREACH table GENERATE *, my.udf.formatDate('%Y-%m-%d',
> > date_field);
> >
> > which works great, and allows you to write fairly general formatting
> code.
> >
> > On Wed, Mar 17, 2010 at 10:01 AM, Johannes Rußek <
> > johannes.rus...@io-consulting.net> wrote:
> >
> > > Am 17.03.2010 17:43, schrieb Dmitriy Ryaboy:
> > >
> > >  Ah, sorry, shouldn't have assumed.
> > >> Yes, on the jira in http://issues.apache.org/jira/browse/PIG -- just
> > >> click
> > >> on "create new issue" and go from there.
> > >> Welcome to apache project stuff :).
> > >>
> > >> -D
> > >>
> > >>
> > >
> > > Hi Dmitriy,
> > > thank you and done: PIG-1303
> > > I will also file a bug against it to be able to set a different locale
> in
> > > the pig script, because my locale on my workstation is a german one,
> yet
> > our
> > > apache logs are create with the standard english one in the dateformat.
> > > DateExtractor then fails to parse the log data since it's not in the
> > german
> > > format.. tbh, i've never seen apache log in a different locale than
> > english
> > > :)
> > > Johannes
> > >
> >
>