Johannes Rußek
Wed, 17 Mar 2010 08:12:22 -0700
Hi David! Thanks a lot for your detailed answer, i will try to use your UDF :)What bothers me though is that it appears that the DateExtractor had worked like we expected at some point in time, since the docs say to use it like that and i could find a bunch of blog posts using it with the format in the constructor..
Thanks anyway :) Johannes Am 17.03.2010 10:07, schrieb David Vrensk:
On Tue, Mar 16, 2010 at 19:58, Johannes Rußek< johannes.rus...@io-consulting.net> wrote:Hello everybody, I've been trying to use org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor from piggybank that comes with pig 0.6.0, but i don't seem to be able to set the output format. whatever i use as the argument in the construct: DEFINE MyDateExtractor org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor('HH:mm:ss'); i only ever get yyyy-MM-dd back. however, when i change DEFAULT_OUTGOING_DATE_FORMAT in main/java/org/apache/pig/piggybank/evaluation/util/apachelogparser/DateExtractor.java to something like 'yyyy-MM-dd-HH' it is able to output the right format. Am i doing something wrong?I don't think so. I ran into the same problem a couple of weeks ago, and played around with the code inserting some print/log statements. It turns out that the arguments are only used in the initial constructor calls, when the pig process is starting, but once pig reaches the point where it would use the udf, it creates new DateExtractors without passing the arguments. I found two ways around this: 1. Let the initial calls to the constructor store the format in a static variable. This is brittle. 2. Supply a date format with the actual calls. This is what I ended up doing (in my own DateExtractor that I created in my own UDF lib). The end result looks like this: public DateExtractor() {} @Override public String exec(Tuple input) throws IOException { if (input == null || input.size() == 0) return null; DateFormat incomingDateFormat = defaultIncomingDateFormat; DateFormat outgoingDateFormat = defaultOutgoingDateFormat; if (input.size()> 1) { outgoingDateFormat = new SimpleDateFormat((String)input.get(1)); outgoingDateFormat.setTimeZone(gmt); } if (input.size()> 2) { incomingDateFormat = new SimpleDateFormat((String)input.get(2)); incomingDateFormat.setTimeZone(gmt); } String str=""; try { str = (String)input.get(0); Date date = incomingDateFormat.parse(str); return outgoingDateFormat.format(date); } catch (ParseException pe) { System.err.println("releware.pig.evaluation.DateExtractor: unable to parse date "+str); return null; } catch(Exception e){ throw WrappedIOException.wrap("Caught exception processing input row ", e); } } and is used like this (hopefully—I can't find the script that used it): DEFINE Xdate com.com.releware.pig.evaluation.DateExtractor; A = *load log;* B = foreach A generate Xdate(A.stupid_timestamp, 'MM-dd'); Hope this helps! /David