[
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570647#comment-15570647
]
ASF GitHub Bot commented on DRILL-3423:
---------------------------------------
Github user cgivre commented on the issue:
https://github.com/apache/drill/pull/607
@chunhui-shi
I've actually been thinking about writing a generic log parser for Drill in
which the user would provide a regex with groups and a list of fields. For
instance consider the following sshd log file:
```
070823 21:00:32 1 Connect root@localhost on test1
070823 21:00:48 1 Query show tables
070823 21:00:56 1 Query select * from category
070917 16:29:01 21 Query select * from location
070917 16:29:12 21 Query select * from location where id = 1
LIMIT 1
```
You can't really split this by space or tab, and dissecting it with various
string slicing functions would lead to some very complex and ugly queries.
But with the following regex:
```
^(\d{6}\s\d{2}:\d{2}:\d{2})\s+(\d+)\s(\w+)\s+(.+)$
```
You can extract all the fields and query them.
With respect to the HTTPD log parser, the log parser accepts a format
string in the configuration (https://issues.apache.org/jira/browse/DRILL-3423)
and with that you can parse any kind of HTTPD log.
> Add New HTTPD format plugin
> ---------------------------
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
> Issue Type: New Feature
> Components: Storage - Other
> Reporter: Jacques Nadeau
> Assignee: Jim Scott
> Fix For: Future
>
>
> Add an HTTPD logparser based format plugin. The author has been kind enough
> to move the logparser project to be released under the Apache License. Can
> find it here:
> <dependency>
> <groupId>nl.basjes.parse.httpdlog</groupId>
> <artifactId>httpdlog-parser</artifactId>
> <version>2.0</version>
> </dependency>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)