pig-user  

Re: piggybank apachelogparser.DateExtractor problem

Dmitriy Ryaboy
Wed, 17 Mar 2010 09:44:06 -0700

Ah, sorry, shouldn't have assumed.
Yes, on the jira in http://issues.apache.org/jira/browse/PIG -- just click
on "create new issue" and go from there.
Welcome to apache project stuff :).

-D

On Wed, Mar 17, 2010 at 9:36 AM, Johannes Rußek <
johannes.rus...@io-consulting.net> wrote:

> Hi Dmitriy,
> where would i open the Ticket?
> http://issues.apache.org/jira/browse/PIG here? Should i mail the developer
> list first?
> Sorry, i'm new to the apache project stuff :)
> Johannes
>
>
> Am 17.03.2010 16:30, schrieb Dmitriy Ryaboy:
>
>  Yeah that's weird. Especially the wrong constructor being called. Could
>> you
>> open a ticket please?
>>
>> On Wed, Mar 17, 2010 at 8:11 AM, Johannes Rußek<
>> johannes.rus...@io-consulting.net>  wrote:
>>
>>
>>
>>> Hi David!
>>> Thanks a lot for your detailed answer, i will try to use your UDF :)
>>> What bothers me though is that it appears that the DateExtractor had
>>> worked
>>> like we expected at some point in time, since the docs say to use it like
>>> that and i could find a bunch of blog posts using it with the format in
>>> the
>>> constructor..
>>> Thanks anyway :)
>>> Johannes
>>>
>>> Am 17.03.2010 10:07, schrieb David Vrensk:
>>>
>>>  On Tue, Mar 16, 2010 at 19:58, Johannes Rußek<
>>>
>>>
>>>> johannes.rus...@io-consulting.net>   wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Hello everybody,
>>>>> I've been trying to use
>>>>> org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor
>>>>> from
>>>>> piggybank that comes with pig 0.6.0, but i don't seem to be able to set
>>>>> the
>>>>> output format.
>>>>> whatever i use as the argument in the construct:
>>>>>
>>>>> DEFINE MyDateExtractor
>>>>>
>>>>>
>>>>> org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor('HH:mm:ss');
>>>>>
>>>>> i only ever get yyyy-MM-dd back.
>>>>> however, when i change DEFAULT_OUTGOING_DATE_FORMAT in
>>>>>
>>>>>
>>>>> main/java/org/apache/pig/piggybank/evaluation/util/apachelogparser/DateExtractor.java
>>>>> to something like 'yyyy-MM-dd-HH' it is able to output the right
>>>>> format.
>>>>> Am i doing something wrong?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> I don't think so.  I ran into the same problem a couple of weeks ago,
>>>> and
>>>> played around with the code inserting some print/log statements.  It
>>>> turns
>>>> out that the arguments are only used in the initial constructor calls,
>>>> when
>>>> the pig process is starting, but once pig reaches the point where it
>>>> would
>>>> use the udf, it creates new DateExtractors without passing the
>>>> arguments.
>>>>
>>>> I found two ways around this:
>>>>
>>>> 1. Let the initial calls to the constructor store the format in a static
>>>> variable.  This is brittle.
>>>> 2. Supply a date format with the actual calls.  This is what I ended up
>>>> doing (in my own DateExtractor that I created in my own UDF lib).  The
>>>> end
>>>> result looks like this:
>>>>
>>>>     public DateExtractor() {}
>>>>
>>>>     @Override
>>>>     public String exec(Tuple input) throws IOException {
>>>>         if (input == null || input.size() == 0)
>>>>             return null;
>>>>
>>>>         DateFormat incomingDateFormat = defaultIncomingDateFormat;
>>>>         DateFormat outgoingDateFormat = defaultOutgoingDateFormat;
>>>>         if (input.size()>   1) {
>>>>             outgoingDateFormat = new
>>>> SimpleDateFormat((String)input.get(1));
>>>>             outgoingDateFormat.setTimeZone(gmt);
>>>>         }
>>>>         if (input.size()>   2) {
>>>>             incomingDateFormat = new
>>>> SimpleDateFormat((String)input.get(2));
>>>>             incomingDateFormat.setTimeZone(gmt);
>>>>         }
>>>>
>>>>         String str="";
>>>>         try {
>>>>             str = (String)input.get(0);
>>>>             Date date = incomingDateFormat.parse(str);
>>>>             return outgoingDateFormat.format(date);
>>>>
>>>>         } catch (ParseException pe) {
>>>>             System.err.println("releware.pig.evaluation.DateExtractor:
>>>> unable to parse date "+str);
>>>>             return null;
>>>>         } catch(Exception e){
>>>>             throw WrappedIOException.wrap("Caught exception processing
>>>> input
>>>> row ", e);
>>>>         }
>>>>     }
>>>>
>>>> and is used like this (hopefully—I can't find the script that used it):
>>>>
>>>> DEFINE Xdate com.com.releware.pig.evaluation.DateExtractor;
>>>>
>>>> A = *load log;*
>>>> B = foreach A generate Xdate(A.stupid_timestamp, 'MM-dd');
>>>>
>>>> Hope this helps!
>>>>
>>>> /David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>