[
https://issues.apache.org/jira/browse/PIG-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659813#comment-14659813
]
Niels Basjes commented on PIG-4639:
-----------------------------------
I understand your concern.
Of course you _can_ include it in pig because it is Apache licensed...
However...
I discussed the question if something like this should be added to PIG itself
with [~alangates] at a conference in Amsterdam (long ago). The design goal of
this library is that is is usable from a multitude of tools and PIG just
happens to be one of them. Apache Drill is currently working on including this
parser in a similar way as I did in this patch : DRILL-3423
So I think adding it to PIG seems like the wrong place.
Viable options I see right now:
# I simply promise you to accept bug fixes. As a safe guard you retain a copy
of the code base on your own system.
# Convert this into an Apache project. Either making it part of an existing
project or creating a new project (maybe "Apache Commons HttpLogParser" ?).
That would mean that other people need to join in.
# Revert this patch and go for PIG-4417 where a user who needs this can simply
do this to download it directly from maven central (i.e. the dependency is only
there if a user chooses it)
{{REGISTER ivy://nl.basjes.parse.httpdlog:httpdlog-pigloader:2.1.1}}
In this case I think simply adding some documentation near the existing 'RegEx'
logparser parsers pointing people towards the 'externally hosted alternative'
would help.
# Revert this patch and simply "Won't Fix".
What options do you see as valid for this case?
> Add better parser for Apache HTTPD access log.
> ----------------------------------------------
>
> Key: PIG-4639
> URL: https://issues.apache.org/jira/browse/PIG-4639
> Project: Pig
> Issue Type: New Feature
> Components: piggybank
> Affects Versions: 0.15.0
> Reporter: Niels Basjes
> Assignee: Niels Basjes
> Fix For: 0.16.0
>
> Attachments: PIG-4639-20150723-classnotfound.patch,
> PIG-4639-20150725.patch, PIG-4639-20150805-1247.patch
>
>
> Currently there are two parsers for Apache HTTPD acces log files in piggybank
> that only allow parsing the 'combined' and 'common' logformats. These two
> also only parse the 'basics'.
> This is proposed patch to add the existing
> https://github.com/nielsbasjes/logparser (Apache 2.0 license) as an 'out of
> the box' parser to piggybank.
> This parser parses the logfile using the LogFormat specification used to
> writte it. Almost all LogFormat specifiers are supported and as such adds
> easy parsing capabilities for (almost) all custom logformats used in
> production scenarios.
> This parser also goes much deeper in the sense that it allows extracting
> things like the value of a cookie or the value of a query string parameter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)