[
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649404#comment-14649404
]
Jacques Nadeau commented on DRILL-3423:
---------------------------------------
Q1: I should provide better comments in the code. Vector memory allocations
work on powers of 2. VarChar uses n+1 slots when allocating data. As such, if
we make batches 4095 in size, then varchar allocations will be 4096 in size and
we will have minimal wastage due to power 2 rounding. If we chose 4096, then
varchar allocations would be 4097 and thus the underlying memory allocation
would be 8192 with virtually half of that wasted.
Q2: My plan was actually to write a blog post around this plugin so people
could use it as a model. (One of the reasons I actually kept in a single
file.) I wanted to get something up for feedback but will be working on adding
javadocs to clarify things.
Q3: Good point. We should implement a new FormatMatcher for access logs that
recognizes this pattern. Can you provide a couple of examples and maybe
propose a format matching algorithm?
> Add New HTTPD format plugin
> ---------------------------
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
> Issue Type: New Feature
> Components: Storage - Other
> Reporter: Jacques Nadeau
> Assignee: Jacques Nadeau
> Fix For: 1.2.0
>
>
> Add an HTTPD logparser based format plugin. The author has been kind enough
> to move the logparser project to be released under the Apache License. Can
> find it here:
> <dependency>
> <groupId>nl.basjes.parse.httpdlog</groupId>
> <artifactId>httpdlog-parser</artifactId>
> <version>2.0</version>
> </dependency>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)