Mridul Muralidharan
Thu, 10 Apr 2008 02:39:11 -0700
Mridul Muralidharan wrote:
Hi Michael,Not sure about the character escaping, but I do have my UDF's in jars independent of pig jars - and that works fine for me. You might want to check for path issues ?
And if there is an empty constructor (or no constructor) for the udf. iirc pig uses the null constructor to create the udf. Mridul
Regards, Mridul Michael Harris wrote:I guess my last message was obvious/stupid since I am not getting any responses, but hopefully I won't be 0/2. I love using Pig and I think it's a fantastic tool for creating complex, map-reduce programs quickly, but that said I am having 2 problems in addition to the one below. Hopefully I am just missing something easy and someone can shoot me a quick response. I have written my own eval func that extracts events from our event log. It then splits the event by some arbitrary regex and then finds the last match from that event that does not match another regex. The queries are as follows. eventlog = LOAD '/user/hadoop/index8mbGZnotes/{1205478000254_1205857683529.gz,1205857686 408_1206295646386.gz,1206295646442_1206757710701.gz,1206757712403_120711 3039900.gz,1207113039930_1207205997234.gz}' USING PigStorage(' '); filterDate = FILTER eventlog BY $1 >= '1204358400000' AND $1 <= '1209625200000'; filterCh = FILTER filterDate BY $15 eq 'Sony' OR $15 eq 'Dell' OR $15 eq 'HP' ; filter1 = FILTER filterCh BY ($5 == 11 AND $6 == 15 AND $7 == 406 ) ; filtered = FOREACH filter1 GENERATE LastPageExtractor($8,'.*(ui/cancel.*)|(.*ui/error.*)','[0-9]{2}:[0-9]{2} :[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{4}'), $15; grouped = GROUP filtered BY ($0, $1); resultUnordered = FOREACH grouped GENERATE FLATTEN(group), FLATTEN(COUNT(filtered)) PARALLEL 14; The func is LastPageExtractor(inputValue, excludeRegex, splitRegex) This all works fine, but I would like to change my split regex to \\|+[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{4} , however when I do that I get this : Exception in thread "Thread-6" org.apache.pig.impl.logicalLayer.parser.TokenMgrError: Lexical error at line 1, column 93. Encountered: "|" (124), after : "\'\\" Is there some special escape sequence I should know about? I searched escape in PigLatin Wiki and found nothing. The second problem I have is I am not able to register jars/funcs without packaging them into the pig.jar in the org.apache.pig.impl.builtin package. I have tried everything I can think of and everything in the documentation. I register the jar with PigServer.registerJar and try to use the fully qualified function name all the task trackers fail with: java.lang.RuntimeException: could not instantiate 'telespree.analytics.pig.LastPageExtractor' with arguments '[]' I do: server.registerJar("c:\\telespree.jar"); and filtered = FOREACH filter1 GENERATE telespree.analytics.pig.LastPageExtractor($8,'.*(ui/cancel.*)|(.*ui/erro r.*)','[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{4}'), $15;"); I even tried to put these functions in the default package in pig.jarsince I saw in the code you do lookups with packageImportList.add("");packageImportList.add("org.apache.pig.builtin."); packageImportList.add("com.yahoo.pig.yst.sds.ULT.");packageImportList.add("org.apache.pig.impl.builtin."); So I figured using the "" import would find my function, however alas I get the same error : java.lang.RuntimeException: could not instantiate 'LastPageExtractor'with arguments '[]' However if I package them in org.apache.pig.impl.builtin it all works fine. Any help on these 3 areas would be much appreciated! -Michael -----Original Message-----From: Michael Harris [EMAIL PROTECTED] Sent: Wednesday, April 02, 2008 10:47 AMTo: pig-user@incubator.apache.org Subject: MapReduceLauncher static fields Hello,I have written a pig application that does a fixed set of queries on-demand through a web interface. I am trying to get the progress of the queries from the PigServer, but I have noticed that the source of the progress data is all static fields in the MapReduceLauncher. Clearly my webapp must be able to handle multiple concurrent pig queries (and be thread-safe) and I would like to report the progress of each individual query (job set) to the end user. Do these static fields indicate that I would get the progress of multiple concurrent queries initiated by different PigServer instances? or would I get the overall progress of the MapReduceLauncher for all queries currently being executed?Thanks, Michael