[jira] [Commented] (DRILL-3423) Add New HTTPD format plugin

Jim Scott (JIRA) Thu, 05 Nov 2015 07:18:09 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991779#comment-14991779
 ]


Jim Scott commented on DRILL-3423:
----------------------------------

Jacques,

I'm not sure I follow this comment "We should also avoid the use of dot 
delimiters being automatically generated by Drill."

Am I correct that your concern is specifically with the configuration of the 
plugin and mapping of the field names.

Here is the problem I have with creating the mappings in the configuration:
1. There are WAY more ways the parser can parse a field than are logical for us 
to create mappings for (e.g. a time field will yield timezone based result and 
a utc based.)
2. By providing a mapping within the drill plugin we have to expose every 
default for anything that may show up in the log parser (e.g. if a new feature 
shows up in the log parser we wouldn't be able to expose it until we make a 
change in the plugin).

Regarding wildcard maps of data I can just as easily remove the :map from the 
end of the field name. I'm indifferent, really. I put it on there to make it 
blatantly obvious. 

As for creating maps like this example:
{code:java}
            case "IP:connection.client.ip":
              add(parser, path, writer.rootAsMap().map("client").varChar("ip"));
              break;
            case "IP:connection.client.peerip":
              add(parser, path, 
writer.rootAsMap().map("client").varChar("peer_ip"));
              break;
            case "IP:connection.server.ip":
              add(parser, path, writer.rootAsMap().map("server").varChar("ip"));
{code}
This model makes it extremely difficult to support mapping of data types. This 
makes an assumption that those fields are varChar and nothing else. Also based 
on the life cycle of creating maps within Drill I don't think this is the most 
logical approach to take. Putting the technical details aside, I as a user 
don't know that I benefit from nesting the data into maps. While from a data 
structure perspective I understand why someone might want to do this, from a 
query perspective I think it makes querying the data more difficult.

> Add New HTTPD format plugin
> ---------------------------
>
>                 Key: DRILL-3423
>                 URL: https://issues.apache.org/jira/browse/DRILL-3423
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Other
>            Reporter: Jacques Nadeau
>            Assignee: Jim Scott
>             Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough 
> to move the logparser project to be released under the Apache License.  Can 
> find it here:
> <dependency>
>     <groupId>nl.basjes.parse.httpdlog</groupId>
>     <artifactId>httpdlog-parser</artifactId>
>     <version>2.0</version>
> </dependency>
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3423) Add New HTTPD format plugin

Reply via email to