[ 
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649404#comment-14649404
 ] 

Jacques Nadeau commented on DRILL-3423:
---------------------------------------

Q1:  I should provide better comments in the code.  Vector memory allocations 
work on powers of 2.  VarChar uses n+1 slots when allocating data.  As such, if 
we make batches 4095 in size, then varchar allocations will be 4096 in size and 
we will have minimal wastage due to power 2 rounding.  If we chose 4096, then 
varchar allocations would be 4097 and thus the underlying memory allocation 
would be 8192 with virtually half of that wasted.

Q2: My plan was actually to write a blog post around this plugin so people 
could use it as a model.  (One of the reasons I actually kept in a single 
file.)  I wanted to get something up for feedback but will be working on adding 
javadocs to clarify things.

Q3: Good point.  We should implement a new FormatMatcher for access logs that 
recognizes this pattern.  Can you provide a couple of examples and maybe 
propose a format matching algorithm?

> Add New HTTPD format plugin
> ---------------------------
>
>                 Key: DRILL-3423
>                 URL: https://issues.apache.org/jira/browse/DRILL-3423
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Other
>            Reporter: Jacques Nadeau
>            Assignee: Jacques Nadeau
>             Fix For: 1.2.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough 
> to move the logparser project to be released under the Apache License.  Can 
> find it here:
> <dependency>
>     <groupId>nl.basjes.parse.httpdlog</groupId>
>     <artifactId>httpdlog-parser</artifactId>
>     <version>2.0</version>
> </dependency>
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to