Alan Gates
Thu, 25 Sep 2008 08:54:01 -0700
raw = LOAD 'access_log' USING com.loghelper.CommongLogLoader AS (remoteAddr, remoteLogname, user, time, method, uri, proto, bytes);
Take a look at org.apache.pig.builtin.PigStorage. You should be able to reuse all of this except for Tuple getNext(). For that function once you get the line from in.readLine you'll need to do the parsing yourself.
Alan. Earl Cahill wrote:
I would like to parse a standard access log and get named variables back. Thinking I need to read in all the lines, then send them through my parsing function. Perhaps the two steps can be combined, but something like preraw = LOAD 'access_log' USING PigStorage() AS (line); raw = FOREACH preraw GENERATE com.loghelper.CommonLogParser(line); So I have CommonLogParser parsing the line well, but I don't know what to put into output so that I can do this raw = FOREACH preraw GENERATE com.loghelper.CommonLogParser(line) AS remoteAddr, remoteLogname, user, time, method, uri, proto, bytes; Extending EvalFunc<DataBag> I tried doing this Tuple tuple = new Tuple(); String remoteAddr = commonLogMatcher.group(1); output.add(new Tuple(remoteAddr)); ... output.add(tuple); to no avail and several other such failing schemes (including extending EvalFunc<DataMap>). Or perhaps there are already parsers that will parse a standard access log? Ideas? Thanks, Earl