Earl Cahill
Thu, 25 Sep 2008 00:01:39 -0700
I would like to parse a standard access log and get named variables back.
Thinking I need to read in all the lines, then send them through my parsing
function. Perhaps the two steps can be combined, but something like
preraw = LOAD 'access_log' USING PigStorage() AS (line);
raw = FOREACH preraw GENERATE com.loghelper.CommonLogParser(line);
So I have CommonLogParser parsing the line well, but I don't know what to put
into output so that I can do this
raw = FOREACH preraw GENERATE com.loghelper.CommonLogParser(line) AS
remoteAddr, remoteLogname, user, time, method, uri, proto, bytes;
Extending EvalFunc<DataBag> I tried doing this
Tuple tuple = new Tuple();
String remoteAddr = commonLogMatcher.group(1);
output.add(new Tuple(remoteAddr));
...
output.add(tuple);
to no avail and several other such failing schemes (including extending
EvalFunc<DataMap>).
Or perhaps there are already parsers that will parse a standard access log?
Ideas?
Thanks,
Earl