Alan Gates
Thu, 25 Sep 2008 13:56:29 -0700
http://incubator.apache.org/pig/version_control.html Alan. Earl Cahill wrote:
Alan, Thanks, will take a look tonight. I guess I can just check out the source? Right now I am doing everything with the jars from the posted (http://wiki.apache.org/pig-data/attachments/PigTutorial/attachments/pigtutorial.tar.gz), is there a better place to get the code? Thanks, Earl ----- Original Message ---- From: Alan Gates <[EMAIL PROTECTED]> To: pig-user@incubator.apache.org; Earl Cahill <[EMAIL PROTECTED]> Sent: Thursday, September 25, 2008 9:50:47 AM Subject: Re: arbitrary LOADing and parsingI think what you want here is a load function rather than an eval function. Check out org.apache.pig.LoadFunc. Then your pig latin would look like:raw = LOAD 'access_log' USING com.loghelper.CommongLogLoader AS (remoteAddr, remoteLogname, user, time, method, uri, proto, bytes);Take a look at org.apache.pig.builtin.PigStorage. You should be able to reuse all of this except for Tuple getNext(). For that function once you get the line from in.readLine you'll need to do the parsing yourself.Alan. Earl Cahill wrote:I would like to parse a standard access log and get named variables back. Thinking I need to read in all the lines, then send them through my parsing function. Perhaps the two steps can be combined, but something like preraw = LOAD 'access_log' USING PigStorage() AS (line); raw = FOREACH preraw GENERATE com.loghelper.CommonLogParser(line); So I have CommonLogParser parsing the line well, but I don't know what to put into output so that I can do this raw = FOREACH preraw GENERATE com.loghelper.CommonLogParser(line) AS remoteAddr, remoteLogname, user, time, method, uri, proto, bytes; Extending EvalFunc<DataBag> I tried doing this Tuple tuple = new Tuple(); String remoteAddr = commonLogMatcher.group(1); output.add(new Tuple(remoteAddr)); ... output.add(tuple); to no avail and several other such failing schemes (including extending EvalFunc<DataMap>). Or perhaps there are already parsers that will parse a standard access log? Ideas? Thanks, Earl