Niels Basjes created PIG-4639:
---------------------------------
Summary: Add better parser for Apache HTTPD access log.
Key: PIG-4639
URL: https://issues.apache.org/jira/browse/PIG-4639
Project: Pig
Issue Type: Improvement
Components: piggybank
Affects Versions: 0.15.0
Reporter: Niels Basjes
Assignee: Niels Basjes
Fix For: 0.16.0
Currently there are two parsers for Apache Logfiles in piggybank that only
allow parsing the 'combined' and 'common' logformats. These two also only parse
the 'basics'.
This is proposed patch to add the existing
https://github.com/nielsbasjes/logparser (Apache 2.0 license) as an 'out of the
box' parser to piggybank that supports (almost) all LogFormat specifiers and as
such adds parsing capabilities for (almost) all custom logformats used in
production scenarios.
This parser also goes much deeper in the sense that it allows extracting things
like the value of a cookie or the value of a query string parameter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)