Dmitriy Ryaboy
Wed, 17 Mar 2010 08:31:25 -0700
Yeah that's weird. Especially the wrong constructor being called. Could you open a ticket please?
On Wed, Mar 17, 2010 at 8:11 AM, Johannes Rußek <
johannes.rus...@io-consulting.net> wrote:
> Hi David!
> Thanks a lot for your detailed answer, i will try to use your UDF :)
> What bothers me though is that it appears that the DateExtractor had worked
> like we expected at some point in time, since the docs say to use it like
> that and i could find a bunch of blog posts using it with the format in the
> constructor..
> Thanks anyway :)
> Johannes
>
> Am 17.03.2010 10:07, schrieb David Vrensk:
>
> On Tue, Mar 16, 2010 at 19:58, Johannes Rußek<
>> johannes.rus...@io-consulting.net> wrote:
>>
>>
>>
>>> Hello everybody,
>>> I've been trying to use
>>> org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor
>>> from
>>> piggybank that comes with pig 0.6.0, but i don't seem to be able to set
>>> the
>>> output format.
>>> whatever i use as the argument in the construct:
>>>
>>> DEFINE MyDateExtractor
>>>
>>> org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor('HH:mm:ss');
>>>
>>> i only ever get yyyy-MM-dd back.
>>> however, when i change DEFAULT_OUTGOING_DATE_FORMAT in
>>>
>>> main/java/org/apache/pig/piggybank/evaluation/util/apachelogparser/DateExtractor.java
>>> to something like 'yyyy-MM-dd-HH' it is able to output the right format.
>>> Am i doing something wrong?
>>>
>>>
>>>
>> I don't think so. I ran into the same problem a couple of weeks ago, and
>> played around with the code inserting some print/log statements. It turns
>> out that the arguments are only used in the initial constructor calls,
>> when
>> the pig process is starting, but once pig reaches the point where it would
>> use the udf, it creates new DateExtractors without passing the arguments.
>>
>> I found two ways around this:
>>
>> 1. Let the initial calls to the constructor store the format in a static
>> variable. This is brittle.
>> 2. Supply a date format with the actual calls. This is what I ended up
>> doing (in my own DateExtractor that I created in my own UDF lib). The end
>> result looks like this:
>>
>> public DateExtractor() {}
>>
>> @Override
>> public String exec(Tuple input) throws IOException {
>> if (input == null || input.size() == 0)
>> return null;
>>
>> DateFormat incomingDateFormat = defaultIncomingDateFormat;
>> DateFormat outgoingDateFormat = defaultOutgoingDateFormat;
>> if (input.size()> 1) {
>> outgoingDateFormat = new
>> SimpleDateFormat((String)input.get(1));
>> outgoingDateFormat.setTimeZone(gmt);
>> }
>> if (input.size()> 2) {
>> incomingDateFormat = new
>> SimpleDateFormat((String)input.get(2));
>> incomingDateFormat.setTimeZone(gmt);
>> }
>>
>> String str="";
>> try {
>> str = (String)input.get(0);
>> Date date = incomingDateFormat.parse(str);
>> return outgoingDateFormat.format(date);
>>
>> } catch (ParseException pe) {
>> System.err.println("releware.pig.evaluation.DateExtractor:
>> unable to parse date "+str);
>> return null;
>> } catch(Exception e){
>> throw WrappedIOException.wrap("Caught exception processing
>> input
>> row ", e);
>> }
>> }
>>
>> and is used like this (hopefully—I can't find the script that used it):
>>
>> DEFINE Xdate com.com.releware.pig.evaluation.DateExtractor;
>>
>> A = *load log;*
>> B = foreach A generate Xdate(A.stupid_timestamp, 'MM-dd');
>>
>> Hope this helps!
>>
>> /David
>>
>>
>>
>
>