Earl Cahill
Thu, 25 Sep 2008 16:58:38 -0700
So I am in getNext, I have the lines parsed and my variables all set. If I have variables called, say remoteAddr, remoteLogname, user, time, method, uri, proto, bytes what do I do next? What do I return? 'fraid the Tuples and the like are still rather new to me. Thanks, Earl ----- Original Message ---- From: Alan Gates <[EMAIL PROTECTED]> To: pig-user@incubator.apache.org; Earl Cahill <[EMAIL PROTECTED]> Sent: Thursday, September 25, 2008 2:53:28 PM Subject: Re: arbitrary LOADing and parsing http://incubator.apache.org/pig/version_control.html Alan. Earl Cahill wrote: > Alan, > > > Thanks, will take a look tonight. I guess I can just check out the source? > Right now I am doing everything with the jars from the posted > (http://wiki.apache.org/pig-data/attachments/PigTutorial/attachments/pigtutorial.tar.gz), > is there a better place to get the code? > > Thanks, > Earl > > > ----- Original Message ---- > From: Alan Gates <[EMAIL PROTECTED]> > To: pig-user@incubator.apache.org; Earl Cahill <[EMAIL PROTECTED]> > Sent: Thursday, September 25, 2008 9:50:47 AM > Subject: Re: arbitrary LOADing and parsing > > I think what you want here is a load function rather than an eval > function. Check out org.apache.pig.LoadFunc. Then your pig latin would > look like: > > raw = LOAD 'access_log' USING com.loghelper.CommongLogLoader AS > (remoteAddr, remoteLogname, user, time, method, uri, proto, bytes); > > Take a look at org.apache.pig.builtin.PigStorage. You should be able to > reuse all of this except for Tuple getNext(). For that function once > you get the line from in.readLine you'll need to do the parsing yourself. > > Alan. > > Earl Cahill wrote: > >> I would like to parse a standard access log and get named variables back. >> Thinking I need to read in all the lines, then send them through my parsing >> function. Perhaps the two steps can be combined, but something like >> >> preraw = LOAD 'access_log' USING PigStorage() AS (line); >> raw = FOREACH preraw GENERATE com.loghelper.CommonLogParser(line); >> >> So I have CommonLogParser parsing the line well, but I don't know what to >> put into output so that I can do this >> >> raw = FOREACH preraw GENERATE com.loghelper.CommonLogParser(line) AS >> remoteAddr, remoteLogname, user, time, method, uri, proto, bytes; >> >> Extending EvalFunc<DataBag> I tried doing this >> >> Tuple tuple = new Tuple(); >> >> String remoteAddr = commonLogMatcher.group(1); >> output.add(new Tuple(remoteAddr)); >> ... >> output.add(tuple); >> >> to no avail and several other such failing schemes (including extending >> EvalFunc<DataMap>). >> >> Or perhaps there are already parsers that will parse a standard access log? >> >> Ideas? >> >> Thanks, >> Earl >> >> >> >> >> > > > >