Github user cgivre commented on the issue:
https://github.com/apache/drill/pull/607
@chunhui-shi
I've actually been thinking about writing a generic log parser for Drill in
which the user would provide a regex with groups and a list of fields. For
instance consider the following sshd log file:
```
070823 21:00:32 1 Connect root@localhost on test1
070823 21:00:48 1 Query show tables
070823 21:00:56 1 Query select * from category
070917 16:29:01 21 Query select * from location
070917 16:29:12 21 Query select * from location where id = 1
LIMIT 1
```
You can't really split this by space or tab, and dissecting it with various
string slicing functions would lead to some very complex and ugly queries.
But with the following regex:
```
^(\d{6}\s\d{2}:\d{2}:\d{2})\s+(\d+)\s(\w+)\s+(.+)$
```
You can extract all the fields and query them.
With respect to the HTTPD log parser, the log parser accepts a format
string in the configuration (https://issues.apache.org/jira/browse/DRILL-3423)
and with that you can parse any kind of HTTPD log.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---