[
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649391#comment-14649391
]
Jacques Nadeau commented on DRILL-3423:
---------------------------------------
Q1: The main reason is that Drill is targeting analysts rather than
developers. We are very focused on separating out data definition from
business rules. The user should have to provide no more information than is
necessary to interact with a new data source. In the case of an Apache HTTPD
log, the only that is needed is a format string. From there, a user can use
the SQL interface to create alternative views, etc (things that support their
particular business needs). The future goal is to make more formats
self-describing directly (as we have already done with Parquet) or indirectly
using what we call a .drill file. This is the same pattern than we use for
JSON, Avro, HBase, etc. It allows non-technical users to interact with new
data quickly and easily. (Note that this also works better in Drill because we
have first class capabilities around complex data and the JSON document model.)
Q2: This has to do with the most efficient way to write into Drill and the fact
that we want to manage the path of write to provide a clean and consistent
complex data model for the underlying format.
> Add New HTTPD format plugin
> ---------------------------
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
> Issue Type: New Feature
> Components: Storage - Other
> Reporter: Jacques Nadeau
> Assignee: Jacques Nadeau
> Fix For: 1.2.0
>
>
> Add an HTTPD logparser based format plugin. The author has been kind enough
> to move the logparser project to be released under the Apache License. Can
> find it here:
> <dependency>
> <groupId>nl.basjes.parse.httpdlog</groupId>
> <artifactId>httpdlog-parser</artifactId>
> <version>2.0</version>
> </dependency>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)