pig-user  

arbitrary LOADing and parsing

Earl Cahill
Thu, 25 Sep 2008 00:01:39 -0700

I would like to parse a standard access log and get named variables back.  
Thinking I need to read in all the lines, then send them through my parsing 
function.  Perhaps the two steps can be combined, but something like

preraw = LOAD 'access_log' USING PigStorage() AS (line);
raw = FOREACH preraw GENERATE com.loghelper.CommonLogParser(line);

So I have CommonLogParser parsing the line well, but I don't know what to put 
into output so that I can do this

raw = FOREACH preraw GENERATE com.loghelper.CommonLogParser(line) AS 
remoteAddr, remoteLogname, user, time, method, uri, proto, bytes;

Extending EvalFunc<DataBag> I tried doing this

Tuple tuple = new Tuple();

String remoteAddr = commonLogMatcher.group(1);
output.add(new Tuple(remoteAddr));
...
output.add(tuple);

to no avail and several other such failing schemes (including extending 
EvalFunc<DataMap>).

Or perhaps there are already parsers that will parse a standard access log?

Ideas?

Thanks,
Earl