Dmitriy Ryaboy
Wed, 17 Mar 2010 09:44:06 -0700
Ah, sorry, shouldn't have assumed. Yes, on the jira in http://issues.apache.org/jira/browse/PIG -- just click on "create new issue" and go from there. Welcome to apache project stuff :).
-D On Wed, Mar 17, 2010 at 9:36 AM, Johannes Rußek < johannes.rus...@io-consulting.net> wrote: > Hi Dmitriy, > where would i open the Ticket? > http://issues.apache.org/jira/browse/PIG here? Should i mail the developer > list first? > Sorry, i'm new to the apache project stuff :) > Johannes > > > Am 17.03.2010 16:30, schrieb Dmitriy Ryaboy: > > Yeah that's weird. Especially the wrong constructor being called. Could >> you >> open a ticket please? >> >> On Wed, Mar 17, 2010 at 8:11 AM, Johannes Rußek< >> johannes.rus...@io-consulting.net> wrote: >> >> >> >>> Hi David! >>> Thanks a lot for your detailed answer, i will try to use your UDF :) >>> What bothers me though is that it appears that the DateExtractor had >>> worked >>> like we expected at some point in time, since the docs say to use it like >>> that and i could find a bunch of blog posts using it with the format in >>> the >>> constructor.. >>> Thanks anyway :) >>> Johannes >>> >>> Am 17.03.2010 10:07, schrieb David Vrensk: >>> >>> On Tue, Mar 16, 2010 at 19:58, Johannes Rußek< >>> >>> >>>> johannes.rus...@io-consulting.net> wrote: >>>> >>>> >>>> >>>> >>>> >>>>> Hello everybody, >>>>> I've been trying to use >>>>> org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor >>>>> from >>>>> piggybank that comes with pig 0.6.0, but i don't seem to be able to set >>>>> the >>>>> output format. >>>>> whatever i use as the argument in the construct: >>>>> >>>>> DEFINE MyDateExtractor >>>>> >>>>> >>>>> org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor('HH:mm:ss'); >>>>> >>>>> i only ever get yyyy-MM-dd back. >>>>> however, when i change DEFAULT_OUTGOING_DATE_FORMAT in >>>>> >>>>> >>>>> main/java/org/apache/pig/piggybank/evaluation/util/apachelogparser/DateExtractor.java >>>>> to something like 'yyyy-MM-dd-HH' it is able to output the right >>>>> format. >>>>> Am i doing something wrong? >>>>> >>>>> >>>>> >>>>> >>>>> >>>> I don't think so. I ran into the same problem a couple of weeks ago, >>>> and >>>> played around with the code inserting some print/log statements. It >>>> turns >>>> out that the arguments are only used in the initial constructor calls, >>>> when >>>> the pig process is starting, but once pig reaches the point where it >>>> would >>>> use the udf, it creates new DateExtractors without passing the >>>> arguments. >>>> >>>> I found two ways around this: >>>> >>>> 1. Let the initial calls to the constructor store the format in a static >>>> variable. This is brittle. >>>> 2. Supply a date format with the actual calls. This is what I ended up >>>> doing (in my own DateExtractor that I created in my own UDF lib). The >>>> end >>>> result looks like this: >>>> >>>> public DateExtractor() {} >>>> >>>> @Override >>>> public String exec(Tuple input) throws IOException { >>>> if (input == null || input.size() == 0) >>>> return null; >>>> >>>> DateFormat incomingDateFormat = defaultIncomingDateFormat; >>>> DateFormat outgoingDateFormat = defaultOutgoingDateFormat; >>>> if (input.size()> 1) { >>>> outgoingDateFormat = new >>>> SimpleDateFormat((String)input.get(1)); >>>> outgoingDateFormat.setTimeZone(gmt); >>>> } >>>> if (input.size()> 2) { >>>> incomingDateFormat = new >>>> SimpleDateFormat((String)input.get(2)); >>>> incomingDateFormat.setTimeZone(gmt); >>>> } >>>> >>>> String str=""; >>>> try { >>>> str = (String)input.get(0); >>>> Date date = incomingDateFormat.parse(str); >>>> return outgoingDateFormat.format(date); >>>> >>>> } catch (ParseException pe) { >>>> System.err.println("releware.pig.evaluation.DateExtractor: >>>> unable to parse date "+str); >>>> return null; >>>> } catch(Exception e){ >>>> throw WrappedIOException.wrap("Caught exception processing >>>> input >>>> row ", e); >>>> } >>>> } >>>> >>>> and is used like this (hopefully—I can't find the script that used it): >>>> >>>> DEFINE Xdate com.com.releware.pig.evaluation.DateExtractor; >>>> >>>> A = *load log;* >>>> B = foreach A generate Xdate(A.stupid_timestamp, 'MM-dd'); >>>> >>>> Hope this helps! >>>> >>>> /David >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >> >> > >