This is an automated email from the ASF dual-hosted git repository.

cgivre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/drill.git


The following commit(s) were added to refs/heads/master by this push:
     new f79587e  [DRILL-7817] Allow immediate parsing of the useragent from 
within the HTTPD LogFormat format plugin
f79587e is described below

commit f79587ed14767463129f8905d2e36e55563655f2
Author: Niels Basjes <[email protected]>
AuthorDate: Mon Dec 7 21:53:36 2020 +0100

    [DRILL-7817] Allow immediate parsing of the useragent from within the HTTPD 
LogFormat format plugin
---
 contrib/format-httpd/README.md                     |  87 +++-
 contrib/format-httpd/pom.xml                       |  12 +-
 .../exec/store/httpd/HttpdLogBatchReader.java      |   8 +-
 .../exec/store/httpd/HttpdLogFormatConfig.java     |  41 +-
 .../exec/store/httpd/HttpdLogFormatPlugin.java     |   6 +-
 .../drill/exec/store/httpd/HttpdLogRecord.java     |  15 +-
 .../apache/drill/exec/store/httpd/HttpdParser.java | 147 ++++---
 .../apache/drill/exec/store/httpd/HttpdUtils.java  |  14 -
 .../main/resources/bootstrap-format-plugins.json   |   6 +-
 .../drill/exec/store/httpd/TestHTTPDLogReader.java | 447 ++++++++-------------
 .../store/httpd/TestHTTPDLogReaderUserAgent.java   | 262 ++++++++++++
 .../test/resources/httpd/multiformat.access_log    |   3 +
 .../src/test/resources/httpd/typeremap.log         |   3 +
 .../src/test/resources/logback-test.txt            |   5 +-
 contrib/udfs/pom.xml                               |   3 +-
 exec/java-exec/pom.xml                             |   2 +-
 pom.xml                                            |   3 +
 17 files changed, 673 insertions(+), 391 deletions(-)

diff --git a/contrib/format-httpd/README.md b/contrib/format-httpd/README.md
index 4d45c0a..87f02d8 100644
--- a/contrib/format-httpd/README.md
+++ b/contrib/format-httpd/README.md
@@ -3,33 +3,37 @@ This plugin enables Drill to read and query httpd (Apache Web 
Server) and nginx
 ) which is available here: https://github.com/nielsbasjes/logparser.
 
 ## Configuration
-There are five fields which you can to configure in order for Drill to read 
web server logs.  In general the defaults should be fine, however the fields 
are:
+There are several fields which you can specify in order for Drill to read web 
server logs. In general the defaults should be fine, however the fields are:
 * **`logFormat`**:  The log format string is the format string found in your 
web server configuration. If you have multiple logFormats then you can add all 
of them in this
  single parameter separated by a newline (`\n`). The parser will automatically 
select the first matching format.
+ Note that the well known formats `common`, `combined`, `combinedio`, 
`referer` and `agent` are also accepted as logFormat.
+ Be aware of leading and trailing spaces on a line when configuring this!
 * **`timestampFormat`**:  The format of time stamps in your log files. This 
setting is optional and is almost never needed.
 * **`extensions`**:  The file extension of your web server logs.  Defaults to 
`httpd`.
 * **`maxErrors`**:  Sets the plugin error tolerance. When set to any value 
less than `0`, Drill will ignore all errors. If unspecified then maxErrors is 0 
which will cause the query to fail on the first error.
 * **`flattenWildcards`**: There are a few variables which Drill extracts into 
maps.  Defaults to `false`.
+* **`parseUserAgent`**: When set to true the [Yauaa useragent 
analyzer](https://yauaa.basjes.nl) will be applied to the UserAgent field if 
present. Defaults to `false` because of the extra startup and memory overhead.
+* **`logParserRemapping`**: This makes it possible to parse deeper into the 
logline in custom situations. See documentation below for further info.
 
-
+In common situations the config will look something like this (having two 
logformats with a newline `\n` separated):
 ```json
 "httpd" : {
   "type" : "httpd",
-  "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"",
-  "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ",
-  "maxErrors": 0, 
-  "flattenWildcards": false
+  "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\" 
%V\ncombined",
+  "maxErrors" : 0,
+  "flattenWildcards" : true,
+  "parseUserAgent" : true
 }
 ```
 
 ## Data Model
 The fields which Drill will return from HTTPD access logs should be fairly 
self explanatory and should all be mapped to correct data types.  For instance, 
`TIMESTAMP` fields are
- all Drill `TIMESTAMPS` and so forth. 
- 
+ all Drill `TIMESTAMPS` and so forth.
+
 ### Nested Columns
 The HTTPD parser can produce a few columns of nested data. For instance, the 
various `query_string` columns are parsed into Drill maps so that if you want 
to look for a specific
- field, you can do so. 
- 
+ field, you can do so.
+
  Drill allows you to directly access maps in with the format of:
  ```
 <table>.<map>.<field>
@@ -41,24 +45,67 @@ FROM dfs.test.`logfile.httpd` AS mylogs
 
 ```
 In this example, we assign an alias of `mylogs` to the table, the column name 
is `request_firstline_uri_query_$` and then the individual field within that 
mapping is `username
-`.  This particular example enables you to analyze items in query strings.  
+`.  This particular example enables you to analyze items in query strings.
 
 ### Flattening Maps
-In the event that you have a map field that you would like broken into columns 
rather than getting the nested fields, you can set the `flattenWildcards` 
option to `true` and 
-Drill will create columns for these fields.  For example if you have a URI 
Query option called `username`.  If you selected the `flattedWildcards` option, 
Drill will create a 
-field called `request_firstline_uri_query_username`.  
+In the event that you have a map field that you would like broken into columns 
rather than getting the nested fields, you can set the `flattenWildcards` 
option to `true` and
+Drill will create columns for these fields.  For example if you have a URI 
Query option called `username`.  If you selected the `flattedWildcards` option, 
Drill will create a
+field called `request_firstline_uri_query_username`.
 
-** Note that underscores in the field name are replaced with double 
underscores ** 
- 
- ## Useful Functions
+** Note that underscores in the field name are replaced with double 
underscores **
+
+## Useful Functions
  If you are using Drill to analyze web access logs, there are a few other 
useful functions which you should know about:
- 
+
  * `parse_url(<url>)`: This function accepts a URL as an argument and returns 
a map of the URL's protocol, authority, host, and path.
  * `parse_query(<query_string>)`: This function accepts a query string and 
returns a key/value pairing of the variables submitted in the request.
  * `parse_user_agent(<user agent>)`, `parse_user_agent( <useragent field>, 
<desired field> )`: The function parse_user_agent() takes a user agent string 
as an argument and
-  returns a map of the available fields. Note that not every field will be 
present in every user agent string. 
+  returns a map of the available fields. Note that not every field will be 
present in every user agent string.
   [Complete Docs 
Here](https://github.com/apache/drill/tree/master/contrib/udfs#user-agent-functions)
- 
+
+## LogParser type remapping
+**Advanced feature**
+The underlying [logparser](https://github.com/nielsbasjes/logparser) supports 
something called type remapping.
+Essentially it means that an extracted value which would normally be treated 
as an unparsable STRING can now be 'cast' to something
+that can be further cut into relevant pieces.
+
+The parameter string is a `;` separated list of mappings.
+Each mapping is a `:` separated list of
+- the name of the underlying logparser field (which is different from th Drill 
column name),
+- the underlying `type` which is used to determine which additional Dissectors 
can be applied.
+- optionally the `cast` (one of `STRING`, `LONG`, `DOUBLE`) which may impact 
the type of the Drill column
+
+Examples:
+- If you have a query parameter in the URL called `ua` which is really the 
UserAgent string and you would like to parse this you can add
+`request.firstline.uri.query.ua:HTTP.USERAGENT`
+- If you have a query parameter in the URL called `timestamp` which is really 
the numerical timestamp (epoch milliseconds).
+The additional "LONG" will cause the returned value be a long which tells 
Drill the `TIME.EPOCH` is to be interpreted as a `TIMESTAMP` column.
+`request.firstline.uri.query.timestamp:TIME.EPOCH:LONG`
+
+Combining all of this can make a query that does something like this:
+```sql
+SELECT
+          `request_receive_time_epoch`
+        , `request_user-agent`
+        , `request_user-agent_device__name`
+        , `request_user-agent_agent__name__version__major`
+        , `request_firstline_uri_query_timestamp`
+        , `request_firstline_uri_query_ua`
+        , `request_firstline_uri_query_ua_device__name`
+        , `request_firstline_uri_query_ua_agent__name__version__major`
+FROM       table(
+             cp.`httpd/typeremap.log`
+                 (
+                   type => 'httpd',
+                   logFormat => 'combined\n%h %l %u %t \"%r\" %>s %b',
+                   flattenWildcards => true,
+                   parseUserAgent => true,
+                   logParserRemapping => '
+                       request.firstline.uri.query.ua        :HTTP.USERAGENT;
+                       request.firstline.uri.query.timestamp :TIME.EPOCH    : 
LONG'
+                 )
+           )
+```
 
 ## Implicit Columns
 Data queried by this plugin will return two implicit columns:
diff --git a/contrib/format-httpd/pom.xml b/contrib/format-httpd/pom.xml
index 50ae618..bc6e463 100644
--- a/contrib/format-httpd/pom.xml
+++ b/contrib/format-httpd/pom.xml
@@ -38,7 +38,7 @@
     <dependency>
       <groupId>nl.basjes.parse.httpdlog</groupId>
       <artifactId>httpdlog-parser</artifactId>
-      <version>5.6</version>
+      <version>${httpdlog-parser.version}</version>
       <exclusions>
         <exclusion>
           <groupId>commons-codec</groupId>
@@ -50,11 +50,19 @@
         </exclusion>
       </exclusions>
     </dependency>
+
     <dependency>
       <groupId>nl.basjes.parse.useragent</groupId>
       <artifactId>yauaa-logparser</artifactId>
-      <version>5.19</version>
+      <version>${yauaa.version}</version>
+      <exclusions>
+        <exclusion>
+          <groupId>nl.basjes.parse.httpdlog</groupId>
+          <artifactId>httpdlog-parser</artifactId>
+        </exclusion>
+      </exclusions>
     </dependency>
+
     <!-- Test dependencies -->
     <dependency>
       <groupId>org.apache.drill.exec</groupId>
diff --git 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogBatchReader.java
 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogBatchReader.java
index 07f1439..275132a 100644
--- 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogBatchReader.java
+++ 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogBatchReader.java
@@ -72,7 +72,13 @@ public class HttpdLogBatchReader implements 
ManagedReader<FileSchemaNegotiator>
     openFile(negotiator);
     errorContext = negotiator.parentErrorContext();
     try {
-      parser = new HttpdParser(formatConfig.getLogFormat(), 
formatConfig.getTimestampFormat(), formatConfig.getFlattenWildcards(), scan);
+      parser = new HttpdParser(
+              formatConfig.getLogFormat(),
+              formatConfig.getTimestampFormat(),
+              formatConfig.getFlattenWildcards(),
+              formatConfig.getParseUserAgent(),
+              formatConfig.getLogParserRemapping(),
+              scan);
       negotiator.tableSchema(parser.setupParser(), false);
     } catch (Exception e) {
       throw UserException.dataReadError(e)
diff --git 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java
 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java
index a1f5617..aa40e95 100644
--- 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java
+++ 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java
@@ -37,9 +37,10 @@ public class HttpdLogFormatConfig implements 
FormatPluginConfig {
   public final String logFormat;
   public final String timestampFormat;
   public final List<String> extensions;
-  public final boolean flattenWildcards;
   public final int maxErrors;
-
+  public final boolean flattenWildcards;
+  public final boolean parseUserAgent;
+  public final String logParserRemapping;
 
   @JsonCreator
   public HttpdLogFormatConfig(
@@ -47,7 +48,9 @@ public class HttpdLogFormatConfig implements 
FormatPluginConfig {
       @JsonProperty("logFormat") String logFormat,
       @JsonProperty("timestampFormat") String timestampFormat,
       @JsonProperty("maxErrors") int maxErrors,
-      @JsonProperty("flattenWildcards") boolean flattenWildcards
+      @JsonProperty("flattenWildcards") boolean flattenWildcards,
+      @JsonProperty("parseUserAgent") boolean parseUserAgent,
+      @JsonProperty("logParserRemapping") String logParserRemapping
   ) {
 
     this.extensions = extensions == null
@@ -57,6 +60,8 @@ public class HttpdLogFormatConfig implements 
FormatPluginConfig {
     this.timestampFormat = timestampFormat;
     this.maxErrors = maxErrors;
     this.flattenWildcards = flattenWildcards;
+    this.parseUserAgent = parseUserAgent;
+    this.logParserRemapping = logParserRemapping;
   }
 
   /**
@@ -78,13 +83,31 @@ public class HttpdLogFormatConfig implements 
FormatPluginConfig {
     return extensions;
   }
 
-  public int getMaxErrors() { return maxErrors;}
+  public int getMaxErrors() {
+    return maxErrors;
+  }
 
-  public boolean getFlattenWildcards () { return flattenWildcards; }
+  public boolean getFlattenWildcards () {
+    return flattenWildcards;
+  }
+
+  public boolean getParseUserAgent() {
+    return parseUserAgent;
+  }
+
+  public String getLogParserRemapping() {
+    return logParserRemapping;
+  }
 
   @Override
   public int hashCode() {
-    return Objects.hash(logFormat, timestampFormat, maxErrors, 
flattenWildcards);
+    return Objects.hash(
+            logFormat,
+            timestampFormat,
+            maxErrors,
+            flattenWildcards,
+            parseUserAgent,
+            logParserRemapping);
   }
 
   @Override
@@ -99,7 +122,9 @@ public class HttpdLogFormatConfig implements 
FormatPluginConfig {
     return Objects.equals(logFormat, other.logFormat)
       && Objects.equals(timestampFormat, other.timestampFormat)
       && Objects.equals(maxErrors, other.maxErrors)
-      && Objects.equals(flattenWildcards, other.flattenWildcards);
+      && Objects.equals(flattenWildcards, other.flattenWildcards)
+      && Objects.equals(parseUserAgent, other.parseUserAgent)
+      && Objects.equals(logParserRemapping, other.logParserRemapping);
   }
 
   @Override
@@ -109,6 +134,8 @@ public class HttpdLogFormatConfig implements 
FormatPluginConfig {
         .field("timestamp format", timestampFormat)
         .field("max errors", maxErrors)
         .field("flattenWildcards", flattenWildcards)
+        .field("parseUserAgent", parseUserAgent)
+        .field("logParserRemapping", logParserRemapping)
         .toString();
   }
 }
diff --git 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java
 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java
index 674bfdb..e81372f 100644
--- 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java
+++ 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java
@@ -36,13 +36,13 @@ public class HttpdLogFormatPlugin extends 
EasyFormatPlugin<HttpdLogFormatConfig>
 
   protected static final String DEFAULT_NAME = "httpd";
 
-  private static class HtttpLogReaderFactory extends FileReaderFactory {
+  private static class HttpLogReaderFactory extends FileReaderFactory {
 
     private final HttpdLogFormatConfig config;
     private final int maxRecords;
     private final EasySubScan scan;
 
-    private HtttpLogReaderFactory(HttpdLogFormatConfig config, int maxRecords, 
EasySubScan scan) {
+    private HttpLogReaderFactory(HttpdLogFormatConfig config, int maxRecords, 
EasySubScan scan) {
       this.config = config;
       this.maxRecords = maxRecords;
       this.scan = scan;
@@ -88,7 +88,7 @@ public class HttpdLogFormatPlugin extends 
EasyFormatPlugin<HttpdLogFormatConfig>
   @Override
   protected FileScanFramework.FileScanBuilder frameworkBuilder(OptionManager 
options, EasySubScan scan) {
     FileScanFramework.FileScanBuilder builder = new 
FileScanFramework.FileScanBuilder();
-    builder.setReaderFactory(new HtttpLogReaderFactory(formatConfig, 
scan.getMaxRecords(), scan));
+    builder.setReaderFactory(new HttpLogReaderFactory(formatConfig, 
scan.getMaxRecords(), scan));
 
     initScanBuilder(builder, scan);
     builder.nullType(Types.optional(TypeProtos.MinorType.VARCHAR));
diff --git 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java
 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java
index 8f2c73a..c30b468 100644
--- 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java
+++ 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java
@@ -27,7 +27,6 @@ import org.apache.drill.exec.vector.accessor.ScalarWriter;
 import org.apache.drill.exec.vector.accessor.TupleWriter;
 import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
 
-import java.util.EnumSet;
 import java.util.HashMap;
 import java.util.Map;
 
@@ -159,7 +158,7 @@ public class HttpdLogRecord {
 
   /**
    * This method is referenced and called via reflection. This is added as a 
parsing target for the parser. It will get
-   * called when the value of a log field is a timesstamp data type.
+   * called when the value of a log field is a timestamp data type.
    *
    * @param field name of field
    * @param value value of field
@@ -180,7 +179,7 @@ public class HttpdLogRecord {
 
   /**
    * This method is referenced and called via reflection. This is added as a 
parsing target for the parser. It will get
-   * called when the value of a log field is a timesstamp data type.
+   * called when the value of a log field is a timestamp data type.
    *
    * @param field name of field
    * @param value value of field
@@ -374,7 +373,7 @@ public class HttpdLogRecord {
    *
    * @param parser The initialized HttpdParser
    * @param rowWriter An initialized RowSetLoader object
-   * @param type The Drill MinorType which sets the data type in the rowWriter
+   * @param columnCasts The logparser casts used to get the right data from 
the parser
    * @param parserFieldName The field name which is generated by the Httpd 
Parser.  These are not "Drill safe"
    * @param drillFieldName The Drill safe field name
    * @param mappedColumns A list of columns mapped to their correct Drill data 
type
@@ -382,12 +381,14 @@ public class HttpdLogRecord {
    */
   public void addField(final Parser<HttpdLogRecord> parser,
                        final RowSetLoader rowWriter,
-                       final EnumSet<Casts> type,
+                       final Map<String, Casts> columnCasts,
                        final String parserFieldName,
                        final String drillFieldName,
                        Map<String, MinorType> mappedColumns) throws 
NoSuchMethodException {
     final boolean hasWildcard = 
parserFieldName.endsWith(HttpdParser.PARSER_WILDCARD);
 
+    final Casts type = columnCasts.getOrDefault(drillFieldName, Casts.STRING);
+
     logger.debug("Field name: {}", parserFieldName);
     rootRowWriter = rowWriter;
     /*
@@ -401,10 +402,10 @@ public class HttpdLogRecord {
       parser.addParseTarget(this.getClass().getMethod("setWildcard", 
String.class, Double.class), parserFieldName);
       parser.addParseTarget(this.getClass().getMethod("setWildcard", 
String.class, Long.class), parserFieldName);
       wildcards.put(cleanName, getMapWriter(drillFieldName, rowWriter));
-    } else if (type.contains(Casts.DOUBLE) || 
mappedColumns.get(drillFieldName) == MinorType.FLOAT8) {
+    } else if (type.equals(Casts.DOUBLE) || mappedColumns.get(drillFieldName) 
== MinorType.FLOAT8) {
       parser.addParseTarget(this.getClass().getMethod("set", String.class, 
Double.class), parserFieldName);
       doubles.put(parserFieldName, rowWriter.scalar(drillFieldName));
-    } else if (type.contains(Casts.LONG) || mappedColumns.get(drillFieldName) 
== MinorType.BIGINT) {
+    } else if (type.equals(Casts.LONG) || mappedColumns.get(drillFieldName) == 
MinorType.BIGINT) {
         parser.addParseTarget(this.getClass().getMethod("set", String.class, 
Long.class), parserFieldName);
         longs.put(parserFieldName, rowWriter.scalar(drillFieldName));
     } else {
diff --git 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
index 36fe949..c31c5ad 100644
--- 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
+++ 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
@@ -17,6 +17,8 @@
  */
 package org.apache.drill.exec.store.httpd;
 
+import nl.basjes.parse.useragent.analyze.InvalidParserConfigurationException;
+import nl.basjes.parse.useragent.dissector.UserAgentDissector;
 import org.apache.drill.common.expression.SchemaPath;
 import org.apache.drill.common.types.TypeProtos;
 import org.apache.drill.common.types.TypeProtos.MinorType;
@@ -25,7 +27,6 @@ import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.exec.record.metadata.TupleMetadata;
 import org.apache.drill.exec.store.dfs.easy.EasySubScan;
 import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
-import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
 import nl.basjes.parse.core.Casts;
 import nl.basjes.parse.core.Parser;
 import nl.basjes.parse.core.exceptions.DissectionFailure;
@@ -35,45 +36,61 @@ import nl.basjes.parse.httpdlog.HttpdLoglineParser;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import java.util.ArrayList;
 import java.util.EnumSet;
-import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
+import java.util.TreeMap;
+
+import static nl.basjes.parse.core.Casts.DOUBLE;
+import static nl.basjes.parse.core.Casts.DOUBLE_ONLY;
+import static nl.basjes.parse.core.Casts.LONG;
+import static nl.basjes.parse.core.Casts.LONG_ONLY;
+import static nl.basjes.parse.core.Casts.STRING;
+import static nl.basjes.parse.core.Casts.STRING_ONLY;
 
 public class HttpdParser {
 
   private static final Logger logger = 
LoggerFactory.getLogger(HttpdParser.class);
 
   public static final String PARSER_WILDCARD = ".*";
-  public static final String REMAPPING_FLAG = "#";
   private final Parser<HttpdLogRecord> parser;
   private final List<SchemaPath> requestedColumns;
   private final Map<String, MinorType> mappedColumns;
+  private final Map<String, Casts> columnCasts;
   private final HttpdLogRecord record;
   private final String logFormat;
+  private final boolean parseUserAgent;
+  private final String logParserRemapping;
   private Map<String, String> requestedPaths;
-  private EnumSet<Casts> casts;
-
 
-  public HttpdParser(final String logFormat, final String timestampFormat, 
final boolean flattenWildcards, final EasySubScan scan) {
+  public HttpdParser(
+          final String logFormat,
+          final String timestampFormat,
+          final boolean flattenWildcards,
+          final boolean parseUserAgent,
+          final String logParserRemapping,
+          final EasySubScan scan) {
 
     Preconditions.checkArgument(logFormat != null && 
!logFormat.trim().isEmpty(), "logFormat cannot be null or empty");
 
     this.logFormat = logFormat;
+    this.parseUserAgent = parseUserAgent;
     this.record = new HttpdLogRecord(timestampFormat, flattenWildcards);
 
-    if (timestampFormat == null) {
-      this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, logFormat);
-    } else {
-      this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, logFormat, 
timestampFormat);
-    }
+    this.logParserRemapping = logParserRemapping;
 
+    this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, 
this.logFormat, timestampFormat);
+    applyRemapping(parser);
     /*
-    * The log parser has the possibility of parsing the user agent and 
extracting additional fields
-    * Unfortunately, doing so negatively affects the speed of the parser.  
Uncommenting this line and another in
-    * the HttpLogRecord will enable these fields.  We will add this 
functionality in a future PR.
-    * this.parser.addDissector(new UserAgentDissector());
-    */
+     * The log parser has the possibility of parsing the user agent and 
extracting additional fields
+     * Unfortunately, doing so negatively affects the startup speed of the 
parser, even if it is not used.
+     * So is is only enabled if there is a need for it in the requested 
columns.
+     */
+    if (parseUserAgent) {
+      parser.addDissector(new UserAgentDissector());
+    }
+
 
     this.requestedColumns = scan.getColumns();
 
@@ -84,7 +101,40 @@ public class HttpdParser {
       logger.info("Specified logformat is a multiline log format: {}", 
logFormat);
     }
 
-    mappedColumns = new HashMap<>();
+    mappedColumns = new TreeMap<>();
+    columnCasts = new TreeMap<>();
+  }
+
+  private void applyRemapping(Parser<?> parser) {
+    if (logParserRemapping == null || logParserRemapping.isEmpty()) {
+      return;
+    }
+
+    for (String rawEntry: logParserRemapping.split(";")) {
+      String entry = rawEntry.replaceAll("\n","").replaceAll(" ","").trim();
+      if (entry.isEmpty()) {
+        continue;
+      }
+
+      String[] parts = entry.split(":");
+      String field = parts[0];
+      String newType = parts[1];
+      String castString = parts.length == 3 ? parts[2] : "STRING";
+
+      switch (castString) {
+        case "STRING":
+          parser.addTypeRemapping(field, newType, STRING_ONLY);
+          break;
+        case "LONG":
+          parser.addTypeRemapping(field, newType, LONG_ONLY);
+          break;
+        case "DOUBLE":
+          parser.addTypeRemapping(field, newType, DOUBLE_ONLY);
+          break;
+        default:
+          throw new InvalidParserConfigurationException("Invalid type 
remapping cast was specified");
+      }
+    }
   }
 
   /**
@@ -110,13 +160,14 @@ public class HttpdParser {
      * efficient way to parse the log.
      */
     List<String> allParserPaths = parser.getPossiblePaths();
+    allParserPaths.sort(String::compareTo);
 
     /*
      * Use all possible paths that the parser has determined from the 
specified log format.
      */
 
-    requestedPaths = Maps.newConcurrentMap();
-
+    // Create a mapping table to each allParserPaths field from their 
corresponding Drill column name.
+    requestedPaths = new TreeMap<>(); // Treemap to have a stable ordering!
     for (final String parserPath : allParserPaths) {
       requestedPaths.put(HttpdUtils.drillFormattedFieldName(parserPath), 
parserPath);
     }
@@ -127,42 +178,42 @@ public class HttpdParser {
      * because this will be the slowest parsing path possible for the 
specified format.
      */
     Parser<Object> dummy = new HttpdLoglineParser<>(Object.class, logFormat);
+    applyRemapping(dummy);
 
-    /* This is the second line to uncomment to add the user agent parsing.
-    * dummy.addDissector(new UserAgentDissector());
-    */
-    dummy.addParseTarget(String.class.getMethod("indexOf", String.class), 
allParserPaths);
+    if (parseUserAgent) {
+      dummy.addDissector(new UserAgentDissector());
+    }
 
-    for (final Map.Entry<String, String> entry : requestedPaths.entrySet()) {
+    dummy.addParseTarget(String.class.getMethod("indexOf", String.class), 
allParserPaths);
 
-      /*
-      If the column is not requested explicitly, remove it from the requested 
path list.
-       */
-      if (! isRequested(entry.getKey()) &&
-        !(isStarQuery()) &&
+    /*
+    If the column is not requested explicitly, remove it from the requested 
path list.
+     */
+    if (!isStarQuery() &&
         !isMetadataQuery() &&
-        !isOnlyImplicitColumns() ) {
-        requestedPaths.remove(entry.getKey());
-        continue;
+        !isOnlyImplicitColumns()) {
+      List<String> keysToRemove = new ArrayList<>();
+      for (final String key : requestedPaths.keySet()) {
+        if (!isRequested(key)) {
+          keysToRemove.add(key);
+        }
       }
+      keysToRemove.forEach( key -> requestedPaths.remove(key));
+    }
 
-      /*
-       * Check the field specified by the user to see if it is supposed to be 
remapped.
-       */
-      if (entry.getValue().startsWith(REMAPPING_FLAG)) {
-        /*
-         * Because this field is being remapped we need to replace the field 
name that the parser uses.
-         */
-        entry.setValue(entry.getValue().substring(REMAPPING_FLAG.length()));
-
-        final String[] pieces = entry.getValue().split(":");
-        HttpdUtils.addTypeRemapping(parser, pieces[1], pieces[0]);
-        casts = Casts.STRING_ONLY;
-      } else {
-        casts = dummy.getCasts(entry.getValue());
+    EnumSet<Casts> allCasts;
+    for (final Map.Entry<String, String> entry : requestedPaths.entrySet()) {
+      allCasts = dummy.getCasts(entry.getValue());
+
+      // Select the cast we want to receive from the parser
+      Casts dataType = STRING;
+      if (allCasts.contains(DOUBLE)) {
+        dataType = DOUBLE;
+      } else if (allCasts.contains(LONG)) {
+        dataType = LONG;
       }
 
-      Casts dataType = (Casts) casts.toArray()[casts.size() - 1];
+      columnCasts.put(entry.getKey(), dataType);
 
       switch (dataType) {
         case STRING:
@@ -208,7 +259,7 @@ public class HttpdParser {
   public void addFieldsToParser(RowSetLoader rowWriter) {
     for (final Map.Entry<String, String> entry : requestedPaths.entrySet()) {
       try {
-        record.addField(parser, rowWriter, casts, entry.getValue(), 
entry.getKey(), mappedColumns);
+        record.addField(parser, rowWriter, columnCasts, entry.getValue(), 
entry.getKey(), mappedColumns);
       } catch (NoSuchMethodException e) {
         logger.error("Error adding fields to parser.");
       }
diff --git 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdUtils.java
 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdUtils.java
index 5a975b6..bb8d28e 100644
--- 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdUtils.java
+++ 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdUtils.java
@@ -18,8 +18,6 @@
 
 package org.apache.drill.exec.store.httpd;
 
-import nl.basjes.parse.core.Parser;
-
 public class HttpdUtils {
 
   public static final String PARSER_WILDCARD = ".*";
@@ -44,18 +42,6 @@ public class HttpdUtils {
   }
 
   /**
-   * In order to define a type remapping the format of the field configuration 
will look like: <br/>
-   * HTTP.URI:request.firstline.uri.query.[parameter name] <br/>
-   *
-   * @param parser    Add type remapping to this parser instance.
-   * @param fieldName request.firstline.uri.query.[parameter_name]
-   * @param fieldType HTTP.URI, etc..
-   */
-  public static void addTypeRemapping(final Parser<HttpdLogRecord> parser, 
final String fieldName, final String fieldType) {
-    parser.addTypeRemapping(fieldName, fieldType);
-  }
-
-  /**
    * Returns true if the field is a wildcard AKA map field, false if not.
    * @param fieldName The target field name
    * @return True if the field is a wildcard, false if not
diff --git 
a/contrib/format-httpd/src/main/resources/bootstrap-format-plugins.json 
b/contrib/format-httpd/src/main/resources/bootstrap-format-plugins.json
index 145e947..654c228 100644
--- a/contrib/format-httpd/src/main/resources/bootstrap-format-plugins.json
+++ b/contrib/format-httpd/src/main/resources/bootstrap-format-plugins.json
@@ -5,7 +5,7 @@
       "formats": {
         "httpd" : {
           "type" : "httpd",
-          "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" 
\"%{User-agent}i\"",
+          "logFormat" : "common\ncombined",
           "maxErrors": 0,
           "flattenWildcards": false
         }
@@ -16,7 +16,7 @@
       "formats": {
         "httpd" : {
           "type" : "httpd",
-          "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" 
\"%{User-agent}i\"",
+          "logFormat" : "common\ncombined",
           "maxErrors": 0,
           "flattenWildcards": false
         }
@@ -27,7 +27,7 @@
       "formats": {
         "httpd" : {
           "type" : "httpd",
-          "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" 
\"%{User-agent}i\"",
+          "logFormat" : "common\ncombined",
           "maxErrors": 0,
           "flattenWildcards": false
         }
diff --git 
a/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java
 
b/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java
index 2dd97fa..f240a82 100644
--- 
a/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java
+++ 
b/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java
@@ -23,6 +23,7 @@ import 
org.apache.drill.common.exceptions.DrillRuntimeException;
 import org.apache.drill.common.types.TypeProtos.MinorType;
 import org.apache.drill.exec.physical.rowSet.RowSet;
 import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.exec.record.MaterializedField;
 import org.apache.drill.exec.record.metadata.SchemaBuilder;
 import org.apache.drill.exec.record.metadata.TupleMetadata;
 import org.apache.drill.exec.rpc.RpcException;
@@ -36,6 +37,8 @@ import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 import java.nio.file.Paths;
+import java.util.stream.Collectors;
+
 import static org.apache.drill.test.QueryTestUtil.generateCompressedFile;
 import static org.junit.Assert.assertEquals;
 import static org.apache.drill.test.rowSet.RowSetUtilities.mapArray;
@@ -271,303 +274,181 @@ public class TestHTTPDLogReader extends ClusterTest {
     new RowSetComparison(expected).verifyAndClearAll(results);
   }
 
+  private TupleMetadata expectedAllFieldsSchema() {
+    return new SchemaBuilder()
+            .addNullable("connection_client_host", MinorType.VARCHAR)
+            .addNullable("connection_client_host_last", MinorType.VARCHAR)
+            .addNullable("connection_client_logname", MinorType.BIGINT)
+            .addNullable("connection_client_logname_last", MinorType.BIGINT)
+            .addNullable("connection_client_user", MinorType.VARCHAR)
+            .addNullable("connection_client_user_last", MinorType.VARCHAR)
+            .addNullable("request_firstline", MinorType.VARCHAR)
+            .addNullable("request_firstline_method", MinorType.VARCHAR)
+            .addNullable("request_firstline_original", MinorType.VARCHAR)
+            .addNullable("request_firstline_original_method", 
MinorType.VARCHAR)
+            .addNullable("request_firstline_original_protocol", 
MinorType.VARCHAR)
+            .addNullable("request_firstline_original_protocol_version", 
MinorType.VARCHAR)
+            .addNullable("request_firstline_original_uri", MinorType.VARCHAR)
+            .addNullable("request_firstline_original_uri_host", 
MinorType.VARCHAR)
+            .addNullable("request_firstline_original_uri_path", 
MinorType.VARCHAR)
+            .addNullable("request_firstline_original_uri_port", 
MinorType.BIGINT)
+            .addNullable("request_firstline_original_uri_protocol", 
MinorType.VARCHAR)
+            .addNullable("request_firstline_original_uri_query", 
MinorType.VARCHAR)
+            .addNullable("request_firstline_original_uri_ref", 
MinorType.VARCHAR)
+            .addNullable("request_firstline_original_uri_userinfo", 
MinorType.VARCHAR)
+            .addNullable("request_firstline_protocol", MinorType.VARCHAR)
+            .addNullable("request_firstline_protocol_version", 
MinorType.VARCHAR)
+            .addNullable("request_firstline_uri", MinorType.VARCHAR)
+            .addNullable("request_firstline_uri_host", MinorType.VARCHAR)
+            .addNullable("request_firstline_uri_path", MinorType.VARCHAR)
+            .addNullable("request_firstline_uri_port", MinorType.BIGINT)
+            .addNullable("request_firstline_uri_protocol", MinorType.VARCHAR)
+            .addNullable("request_firstline_uri_query", MinorType.VARCHAR)
+            .addNullable("request_firstline_uri_ref", MinorType.VARCHAR)
+            .addNullable("request_firstline_uri_userinfo", MinorType.VARCHAR)
+            .addNullable("request_receive_time", MinorType.TIMESTAMP)
+            .addNullable("request_receive_time_date", MinorType.DATE)
+            .addNullable("request_receive_time_date__utc", MinorType.DATE)
+            .addNullable("request_receive_time_day", MinorType.BIGINT)
+            .addNullable("request_receive_time_day__utc", MinorType.BIGINT)
+            .addNullable("request_receive_time_epoch", MinorType.TIMESTAMP)
+            .addNullable("request_receive_time_hour", MinorType.BIGINT)
+            .addNullable("request_receive_time_hour__utc", MinorType.BIGINT)
+            .addNullable("request_receive_time_last", MinorType.TIMESTAMP)
+            .addNullable("request_receive_time_last_date", MinorType.DATE)
+            .addNullable("request_receive_time_last_date__utc", MinorType.DATE)
+            .addNullable("request_receive_time_last_day", MinorType.BIGINT)
+            .addNullable("request_receive_time_last_day__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_epoch", 
MinorType.TIMESTAMP)
+            .addNullable("request_receive_time_last_hour", MinorType.BIGINT)
+            .addNullable("request_receive_time_last_hour__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_microsecond", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_microsecond__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_millisecond", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_millisecond__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_minute", MinorType.BIGINT)
+            .addNullable("request_receive_time_last_minute__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_month", MinorType.BIGINT)
+            .addNullable("request_receive_time_last_month__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_monthname", 
MinorType.VARCHAR)
+            .addNullable("request_receive_time_last_monthname__utc", 
MinorType.VARCHAR)
+            .addNullable("request_receive_time_last_nanosecond", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_nanosecond__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_second", MinorType.BIGINT)
+            .addNullable("request_receive_time_last_second__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_time", MinorType.TIME)
+            .addNullable("request_receive_time_last_time__utc", MinorType.TIME)
+            .addNullable("request_receive_time_last_timezone", 
MinorType.VARCHAR)
+            .addNullable("request_receive_time_last_weekofweekyear", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_weekofweekyear__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_weekyear", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_weekyear__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_last_year", MinorType.BIGINT)
+            .addNullable("request_receive_time_last_year__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_microsecond", MinorType.BIGINT)
+            .addNullable("request_receive_time_microsecond__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_millisecond", MinorType.BIGINT)
+            .addNullable("request_receive_time_millisecond__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_minute", MinorType.BIGINT)
+            .addNullable("request_receive_time_minute__utc", MinorType.BIGINT)
+            .addNullable("request_receive_time_month", MinorType.BIGINT)
+            .addNullable("request_receive_time_month__utc", MinorType.BIGINT)
+            .addNullable("request_receive_time_monthname", MinorType.VARCHAR)
+            .addNullable("request_receive_time_monthname__utc", 
MinorType.VARCHAR)
+            .addNullable("request_receive_time_nanosecond", MinorType.BIGINT)
+            .addNullable("request_receive_time_nanosecond__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_second", MinorType.BIGINT)
+            .addNullable("request_receive_time_second__utc", MinorType.BIGINT)
+            .addNullable("request_receive_time_time", MinorType.TIME)
+            .addNullable("request_receive_time_time__utc", MinorType.TIME)
+            .addNullable("request_receive_time_timezone", MinorType.VARCHAR)
+            .addNullable("request_receive_time_weekofweekyear", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_weekofweekyear__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_weekyear", MinorType.BIGINT)
+            .addNullable("request_receive_time_weekyear__utc", 
MinorType.BIGINT)
+            .addNullable("request_receive_time_year", MinorType.BIGINT)
+            .addNullable("request_receive_time_year__utc", MinorType.BIGINT)
+            .addNullable("request_referer", MinorType.VARCHAR)
+            .addNullable("request_referer_host", MinorType.VARCHAR)
+            .addNullable("request_referer_last", MinorType.VARCHAR)
+            .addNullable("request_referer_last_host", MinorType.VARCHAR)
+            .addNullable("request_referer_last_path", MinorType.VARCHAR)
+            .addNullable("request_referer_last_port", MinorType.BIGINT)
+            .addNullable("request_referer_last_protocol", MinorType.VARCHAR)
+            .addNullable("request_referer_last_query", MinorType.VARCHAR)
+            .addNullable("request_referer_last_ref", MinorType.VARCHAR)
+            .addNullable("request_referer_last_userinfo", MinorType.VARCHAR)
+            .addNullable("request_referer_path", MinorType.VARCHAR)
+            .addNullable("request_referer_port", MinorType.BIGINT)
+            .addNullable("request_referer_protocol", MinorType.VARCHAR)
+            .addNullable("request_referer_query", MinorType.VARCHAR)
+            .addNullable("request_referer_ref", MinorType.VARCHAR)
+            .addNullable("request_referer_userinfo", MinorType.VARCHAR)
+            .addNullable("request_status_last", MinorType.VARCHAR)
+            .addNullable("request_user-agent", MinorType.VARCHAR)
+            .addNullable("request_user-agent_last", MinorType.VARCHAR)
+            .addNullable("response_body_bytes", MinorType.BIGINT)
+            .addNullable("response_body_bytes_last", MinorType.BIGINT)
+            .addNullable("response_body_bytesclf", MinorType.BIGINT)
+            .add("request_firstline_original_uri_query_$", MinorType.MAP)
+            .add("request_firstline_uri_query_$", MinorType.MAP)
+            .add("request_referer_last_query_$", MinorType.MAP)
+            .add("request_referer_query_$", MinorType.MAP)
+            .build();
+  }
+
+  private RowSet expectedAllFieldsRowSet(TupleMetadata expectedSchema) {
+    return client
+            .rowSetBuilder(expectedSchema)
+            .addRow("195.154.46.135", "195.154.46.135", null, null, null, null,
+                    "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1", 
"GET",
+                    "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1", 
"GET",
+                    "HTTP/1.1", "1.1", 
"/linux/doing-pxe-without-dhcp-control", null, 
"/linux/doing-pxe-without-dhcp-control", null, null, null, null, null,
+                    "HTTP/1.1", "1.1", 
"/linux/doing-pxe-without-dhcp-control", null, 
"/linux/doing-pxe-without-dhcp-control", null, null, null, null, null,
+                    1445742685000L, new LocalDate("2015-10-25"), new 
LocalDate("2015-10-25"), 25, 25, 1445742685000L, 4, 3,
+                    1445742685000L, new LocalDate("2015-10-25"), new 
LocalDate("2015-10-25"), 25, 25, 1445742685000L, 4, 3,
+                    0, 0, 0, 0, 11, 11, 10, 10, "October", "October", 0, 0, 
25, 25, new LocalTime("04:11:25"), new LocalTime("03:11:25"), "+01:00", 43, 43, 
2015, 2015, 2015, 2015,
+                    0, 0, 0, 0, 11, 11, 10, 10, "October", "October", 0, 0, 
25, 25, new LocalTime("04:11:25"), new LocalTime("03:11:25"), "+01:00", 43, 43, 
2015, 2015, 2015, 2015,
+                    "http://howto.basjes.nl/";, "howto.basjes.nl",
+                    "http://howto.basjes.nl/";, "howto.basjes.nl",
+                    "/", null, "http", null, null, null,
+                    "/", null, "http", null, null, null,
+                    "200",
+                    "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 
Firefox/35.0",
+                    "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 
Firefox/35.0",
+                    24323, 24323, 24323, mapArray(), mapArray(), mapArray(), 
mapArray())
+            .build();
+  }
+
   @Test
   public void testStarRowSet() throws Exception {
     String sql = "SELECT * FROM cp.`httpd/hackers-access-really-small.httpd`";
 
     RowSet results = client.queryBuilder().sql(sql).rowSet();
 
-    TupleMetadata expectedSchema = new SchemaBuilder()
-      .addNullable("request_referer_ref", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_time", MinorType.TIME)
-      .addNullable("request_firstline_uri_protocol", MinorType.VARCHAR)
-      .addNullable("request_receive_time_microsecond", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_microsecond__utc", 
MinorType.BIGINT)
-      .addNullable("request_firstline_original_protocol", MinorType.VARCHAR)
-      .addNullable("request_firstline_original_uri_host", MinorType.VARCHAR)
-      .addNullable("request_referer_host", MinorType.VARCHAR)
-      .addNullable("request_receive_time_month__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_minute", MinorType.BIGINT)
-      .addNullable("request_firstline_protocol_version", MinorType.VARCHAR)
-      .addNullable("request_receive_time_time__utc", MinorType.TIME)
-      .addNullable("request_referer_last_ref", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_timezone", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_weekofweekyear", 
MinorType.BIGINT)
-      .addNullable("request_referer_last", MinorType.VARCHAR)
-      .addNullable("request_receive_time_minute", MinorType.BIGINT)
-      .addNullable("connection_client_host_last", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_millisecond__utc", 
MinorType.BIGINT)
-      .addNullable("request_firstline_original_uri", MinorType.VARCHAR)
-      .addNullable("request_firstline", MinorType.VARCHAR)
-      .addNullable("request_receive_time_nanosecond", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_millisecond", MinorType.BIGINT)
-      .addNullable("request_receive_time_day", MinorType.BIGINT)
-      .addNullable("request_referer_port", MinorType.BIGINT)
-      .addNullable("request_firstline_original_uri_port", MinorType.BIGINT)
-      .addNullable("request_receive_time_year", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_date", MinorType.DATE)
-      .addNullable("request_receive_time_last_time__utc", MinorType.TIME)
-      .addNullable("request_receive_time_last_hour__utc", MinorType.BIGINT)
-      .addNullable("request_firstline_original_protocol_version", 
MinorType.VARCHAR)
-      .addNullable("request_firstline_original_method", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_year__utc", MinorType.BIGINT)
-      .addNullable("request_firstline_uri", MinorType.VARCHAR)
-      .addNullable("request_referer_last_host", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_minute__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_weekofweekyear", MinorType.BIGINT)
-      .addNullable("request_firstline_uri_userinfo", MinorType.VARCHAR)
-      .addNullable("request_receive_time_epoch", MinorType.TIMESTAMP)
-      .addNullable("connection_client_logname", MinorType.BIGINT)
-      .addNullable("response_body_bytes", MinorType.BIGINT)
-      .addNullable("request_receive_time_nanosecond__utc", MinorType.BIGINT)
-      .addNullable("request_firstline_protocol", MinorType.VARCHAR)
-      .addNullable("request_receive_time_microsecond__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_hour", MinorType.BIGINT)
-      .addNullable("request_firstline_uri_host", MinorType.VARCHAR)
-      .addNullable("request_referer_last_port", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_epoch", MinorType.TIMESTAMP)
-      .addNullable("request_receive_time_last_weekyear__utc", MinorType.BIGINT)
-      .addNullable("request_user-agent", MinorType.VARCHAR)
-      .addNullable("request_receive_time_weekyear", MinorType.BIGINT)
-      .addNullable("request_receive_time_timezone", MinorType.VARCHAR)
-      .addNullable("response_body_bytesclf", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_date__utc", MinorType.DATE)
-      .addNullable("request_receive_time_millisecond__utc", MinorType.BIGINT)
-      .addNullable("request_referer_last_protocol", MinorType.VARCHAR)
-      .addNullable("request_firstline_uri_query", MinorType.VARCHAR)
-      .addNullable("request_receive_time_minute__utc", MinorType.BIGINT)
-      .addNullable("request_firstline_original_uri_protocol", 
MinorType.VARCHAR)
-      .addNullable("request_referer_query", MinorType.VARCHAR)
-      .addNullable("request_receive_time_date", MinorType.DATE)
-      .addNullable("request_firstline_uri_port", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_second__utc", MinorType.BIGINT)
-      .addNullable("request_referer_last_userinfo", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_second", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_monthname__utc", 
MinorType.VARCHAR)
-      .addNullable("request_firstline_method", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_month__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_millisecond", MinorType.BIGINT)
-      .addNullable("request_receive_time_day__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_year__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_weekofweekyear__utc", 
MinorType.BIGINT)
-      .addNullable("request_receive_time_second", MinorType.BIGINT)
-      .addNullable("request_firstline_original_uri_ref", MinorType.VARCHAR)
-      .addNullable("connection_client_logname_last", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_year", MinorType.BIGINT)
-      .addNullable("request_firstline_original_uri_path", MinorType.VARCHAR)
-      .addNullable("connection_client_host", MinorType.VARCHAR)
-      .addNullable("request_firstline_original_uri_query", MinorType.VARCHAR)
-      .addNullable("request_referer_userinfo", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_monthname", MinorType.VARCHAR)
-      .addNullable("request_referer_path", MinorType.VARCHAR)
-      .addNullable("request_receive_time_monthname", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_month", MinorType.BIGINT)
-      .addNullable("request_referer_last_query", MinorType.VARCHAR)
-      .addNullable("request_firstline_uri_ref", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_day", MinorType.BIGINT)
-      .addNullable("request_receive_time_time", MinorType.TIME)
-      .addNullable("request_status_original", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_weekofweekyear__utc", 
MinorType.BIGINT)
-      .addNullable("request_user-agent_last", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_weekyear", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_microsecond", MinorType.BIGINT)
-      .addNullable("request_firstline_original", MinorType.VARCHAR)
-      .addNullable("request_status", MinorType.VARCHAR)
-      .addNullable("request_referer_last_path", MinorType.VARCHAR)
-      .addNullable("request_receive_time_month", MinorType.BIGINT)
-      .addNullable("request_referer", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_day__utc", MinorType.BIGINT)
-      .addNullable("request_referer_protocol", MinorType.VARCHAR)
-      .addNullable("request_receive_time_monthname__utc", MinorType.VARCHAR)
-      .addNullable("response_body_bytes_last", MinorType.BIGINT)
-      .addNullable("request_receive_time", MinorType.TIMESTAMP)
-      .addNullable("request_receive_time_last_nanosecond", MinorType.BIGINT)
-      .addNullable("request_firstline_uri_path", MinorType.VARCHAR)
-      .addNullable("request_firstline_original_uri_userinfo", 
MinorType.VARCHAR)
-      .addNullable("request_receive_time_date__utc", MinorType.DATE)
-      .addNullable("request_receive_time_last", MinorType.TIMESTAMP)
-      .addNullable("request_receive_time_last_nanosecond__utc", 
MinorType.BIGINT)
-      .addNullable("request_receive_time_last_hour", MinorType.BIGINT)
-      .addNullable("request_receive_time_hour__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_second__utc", MinorType.BIGINT)
-      .addNullable("connection_client_user_last", MinorType.VARCHAR)
-      .addNullable("request_receive_time_weekyear__utc", MinorType.BIGINT)
-      .addNullable("connection_client_user", MinorType.VARCHAR)
-      .add("request_firstline_original_uri_query_$", MinorType.MAP)
-      .add("request_referer_query_$", MinorType.MAP)
-      .add("request_referer_last_query_$", MinorType.MAP)
-      .add("request_firstline_uri_query_$", MinorType.MAP)
-      .build();
-
-    RowSet expected = client.rowSetBuilder(expectedSchema)
-      .addRow(null,  new LocalTime("04:11:25"), null, 0, 0, "HTTP", null, 
"howto.basjes.nl", 10, 11, "1.1", new LocalTime("03:11:25"), null, "+01:00", 
43, "http://howto.basjes"; +
-          ".nl/",
-        11, "195.154.46.135", 0,
-        "/linux/doing-pxe-without-dhcp-control", "GET 
/linux/doing-pxe-without-dhcp-control HTTP/1.1", 0, 0, 25, null, null, 2015, 
new LocalDate("2015-10-25"), new LocalTime("03" +
-          ":11:25"),
-        3, "1" +
-          ".1", "GET",
-        2015, "/linux/doing-pxe-without-dhcp-control", "howto.basjes.nl", 11, 
43, null, 1445742685000L, null, 24323, 0, "HTTP", 0, 4, null, null, 
1445742685000L, 2015, "Mozilla" +
-          "/5" +
-          ".0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, 
"+01:00", 24323, new LocalDate("2015-10-25"), 0, "http", null, 11, null, null, 
new LocalDate("2015-10" +
-          "-25"), null, 25,
-        null, 25,
-        "October", "GET", 10, 0, 25, 2015, 43, 25, null, null, 2015, 
"/linux/doing-pxe-without-dhcp-control", "195.154.46.135", null, null, 
"October", "/", "October", 10, null,
-        null, 25, new LocalTime("04:11:25"), "200", 43, "Mozilla/5.0 (Windows 
NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, 0, "GET 
/linux/doing-pxe-without-dhcp-control " +
-          "HTTP/1.1", "200", "/",
-        10, "http://howto.basjes.nl/";, 25, "http", "October", 24323, 
1445742685000L, 0, "/linux/doing-pxe-without-dhcp-control", null, new 
LocalDate("2015-10-25"), 1445742685000L,
-        0, 4, 3, 25, null, 2015, null, mapArray(), mapArray(), mapArray(), 
mapArray())
-      .build();
+    TupleMetadata expectedSchema = expectedAllFieldsSchema();
 
-    new RowSetComparison(expected).verifyAndClearAll(results);
+    RowSet expectedRowSet = expectedAllFieldsRowSet(expectedSchema);
+    new RowSetComparison(expectedRowSet).verifyAndClearAll(results);
   }
 
   @Test
   public void testExplicitAllFields() throws Exception {
-    String sql = "SELECT `request_referer_ref`, 
`request_receive_time_last_time`, `request_firstline_uri_protocol`, 
`request_receive_time_microsecond`, 
`request_receive_time_last_microsecond__utc`, 
`request_firstline_original_protocol`, `request_firstline_original_uri_host`, 
`request_referer_host`, `request_receive_time_month__utc`, 
`request_receive_time_last_minute`, `request_firstline_protocol_version`, 
`request_receive_time_time__utc`, `request_referer_last_ref`, 
`request_receive_time [...]
+    TupleMetadata expectedSchema = expectedAllFieldsSchema();
 
-    RowSet results = client.queryBuilder().sql(sql).rowSet();
+    // To avoid typos we generate the SQL from the schema.
+    String sql = "SELECT `" +
+            expectedSchema
+                    .toFieldList()
+                    .stream()
+                    .map(MaterializedField::getName)
+                    .collect(Collectors.joining("`, `")) +
+            "` FROM cp.`httpd/hackers-access-really-small.httpd`";
 
-    TupleMetadata expectedSchema = new SchemaBuilder()
-      .addNullable("request_referer_ref", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_time", MinorType.TIME)
-      .addNullable("request_firstline_uri_protocol", MinorType.VARCHAR)
-      .addNullable("request_receive_time_microsecond", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_microsecond__utc", 
MinorType.BIGINT)
-      .addNullable("request_firstline_original_protocol", MinorType.VARCHAR)
-      .addNullable("request_firstline_original_uri_host", MinorType.VARCHAR)
-      .addNullable("request_referer_host", MinorType.VARCHAR)
-      .addNullable("request_receive_time_month__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_minute", MinorType.BIGINT)
-      .addNullable("request_firstline_protocol_version", MinorType.VARCHAR)
-      .addNullable("request_receive_time_time__utc", MinorType.TIME)
-      .addNullable("request_referer_last_ref", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_timezone", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_weekofweekyear", 
MinorType.BIGINT)
-      .addNullable("request_referer_last", MinorType.VARCHAR)
-      .addNullable("request_receive_time_minute", MinorType.BIGINT)
-      .addNullable("connection_client_host_last", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_millisecond__utc", 
MinorType.BIGINT)
-      .addNullable("request_firstline_original_uri", MinorType.VARCHAR)
-      .addNullable("request_firstline", MinorType.VARCHAR)
-      .addNullable("request_receive_time_nanosecond", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_millisecond", MinorType.BIGINT)
-      .addNullable("request_receive_time_day", MinorType.BIGINT)
-      .addNullable("request_referer_port", MinorType.BIGINT)
-      .addNullable("request_firstline_original_uri_port", MinorType.BIGINT)
-      .addNullable("request_receive_time_year", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_date", MinorType.DATE)
-      .addNullable("request_receive_time_last_time__utc", MinorType.TIME)
-      .addNullable("request_receive_time_last_hour__utc", MinorType.BIGINT)
-      .addNullable("request_firstline_original_protocol_version", 
MinorType.VARCHAR)
-      .addNullable("request_firstline_original_method", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_year__utc", MinorType.BIGINT)
-      .addNullable("request_firstline_uri", MinorType.VARCHAR)
-      .addNullable("request_referer_last_host", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_minute__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_weekofweekyear", MinorType.BIGINT)
-      .addNullable("request_firstline_uri_userinfo", MinorType.VARCHAR)
-      .addNullable("request_receive_time_epoch", MinorType.TIMESTAMP)
-      .addNullable("connection_client_logname", MinorType.BIGINT)
-      .addNullable("response_body_bytes", MinorType.BIGINT)
-      .addNullable("request_receive_time_nanosecond__utc", MinorType.BIGINT)
-      .addNullable("request_firstline_protocol", MinorType.VARCHAR)
-      .addNullable("request_receive_time_microsecond__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_hour", MinorType.BIGINT)
-      .addNullable("request_firstline_uri_host", MinorType.VARCHAR)
-      .addNullable("request_referer_last_port", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_epoch", MinorType.TIMESTAMP)
-      .addNullable("request_receive_time_last_weekyear__utc", MinorType.BIGINT)
-      .addNullable("request_user-agent", MinorType.VARCHAR)
-      .addNullable("request_receive_time_weekyear", MinorType.BIGINT)
-      .addNullable("request_receive_time_timezone", MinorType.VARCHAR)
-      .addNullable("response_body_bytesclf", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_date__utc", MinorType.DATE)
-      .addNullable("request_receive_time_millisecond__utc", MinorType.BIGINT)
-      .addNullable("request_referer_last_protocol", MinorType.VARCHAR)
-      .addNullable("request_firstline_uri_query", MinorType.VARCHAR)
-      .addNullable("request_receive_time_minute__utc", MinorType.BIGINT)
-      .addNullable("request_firstline_original_uri_protocol", 
MinorType.VARCHAR)
-      .addNullable("request_referer_query", MinorType.VARCHAR)
-      .addNullable("request_receive_time_date", MinorType.DATE)
-      .addNullable("request_firstline_uri_port", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_second__utc", MinorType.BIGINT)
-      .addNullable("request_referer_last_userinfo", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_second", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_monthname__utc", 
MinorType.VARCHAR)
-      .addNullable("request_firstline_method", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_month__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_millisecond", MinorType.BIGINT)
-      .addNullable("request_receive_time_day__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_year__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_weekofweekyear__utc", 
MinorType.BIGINT)
-      .addNullable("request_receive_time_second", MinorType.BIGINT)
-      .addNullable("request_firstline_original_uri_ref", MinorType.VARCHAR)
-      .addNullable("connection_client_logname_last", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_year", MinorType.BIGINT)
-      .addNullable("request_firstline_original_uri_path", MinorType.VARCHAR)
-      .addNullable("connection_client_host", MinorType.VARCHAR)
-      .addNullable("request_firstline_original_uri_query", MinorType.VARCHAR)
-      .addNullable("request_referer_userinfo", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_monthname", MinorType.VARCHAR)
-      .addNullable("request_referer_path", MinorType.VARCHAR)
-      .addNullable("request_receive_time_monthname", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_month", MinorType.BIGINT)
-      .addNullable("request_referer_last_query", MinorType.VARCHAR)
-      .addNullable("request_firstline_uri_ref", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_day", MinorType.BIGINT)
-      .addNullable("request_receive_time_time", MinorType.TIME)
-      .addNullable("request_status_original", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_weekofweekyear__utc", 
MinorType.BIGINT)
-      .addNullable("request_user-agent_last", MinorType.VARCHAR)
-      .addNullable("request_receive_time_last_weekyear", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_microsecond", MinorType.BIGINT)
-      .addNullable("request_firstline_original", MinorType.VARCHAR)
-      .addNullable("request_status", MinorType.VARCHAR)
-      .addNullable("request_referer_last_path", MinorType.VARCHAR)
-      .addNullable("request_receive_time_month", MinorType.BIGINT)
-      .addNullable("request_receive_time_last_day__utc", MinorType.BIGINT)
-      .addNullable("request_referer", MinorType.VARCHAR)
-      .addNullable("request_referer_protocol", MinorType.VARCHAR)
-      .addNullable("request_receive_time_monthname__utc", MinorType.VARCHAR)
-      .addNullable("response_body_bytes_last", MinorType.BIGINT)
-      .addNullable("request_receive_time", MinorType.TIMESTAMP)
-      .addNullable("request_receive_time_last_nanosecond", MinorType.BIGINT)
-      .addNullable("request_firstline_uri_path", MinorType.VARCHAR)
-      .addNullable("request_firstline_original_uri_userinfo", 
MinorType.VARCHAR)
-      .addNullable("request_receive_time_date__utc", MinorType.DATE)
-      .addNullable("request_receive_time_last", MinorType.TIMESTAMP)
-      .addNullable("request_receive_time_last_nanosecond__utc", 
MinorType.BIGINT)
-      .addNullable("request_receive_time_last_hour", MinorType.BIGINT)
-      .addNullable("request_receive_time_hour__utc", MinorType.BIGINT)
-      .addNullable("request_receive_time_second__utc", MinorType.BIGINT)
-      .addNullable("connection_client_user_last", MinorType.VARCHAR)
-      .addNullable("request_receive_time_weekyear__utc", MinorType.BIGINT)
-      .addNullable("connection_client_user", MinorType.VARCHAR)
-      .add("request_firstline_original_uri_query_$", MinorType.MAP)
-      .add("request_referer_query_$", MinorType.MAP)
-      .add("request_referer_last_query_$", MinorType.MAP)
-      .add("request_firstline_uri_query_$", MinorType.MAP)
-      .build();
-
-    RowSet expected = client.rowSetBuilder(expectedSchema)
-      .addRow(null, new LocalTime("04:11:25"), null, 0, 0, "HTTP", null, 
"howto.basjes.nl", 10, 11, "1.1", new LocalTime("03:11:25"), null, "+01:00", 
43, "http://howto.basjes.nl/";,
-        11, "195.154.46.135", 0,
-        "/linux/doing-pxe-without-dhcp-control", "GET 
/linux/doing-pxe-without-dhcp-control HTTP/1.1", 0, 0, 25, null, null, 2015, 
new LocalDate("2015-10-25"), new LocalTime("03" +
-          ":11:25"), 3, "1" +
-          ".1", "GET",
-        2015, "/linux/doing-pxe-without-dhcp-control", "howto.basjes.nl", 11, 
43, null, 1445742685000L, null, 24323, 0, "HTTP", 0, 4, null, null, 
1445742685000L, 2015, "Mozilla" +
-          "/5" +
-          ".0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, 
"+01:00", 24323, new LocalDate("2015-10-25"), 0, "http", null, 11, null, null, 
new LocalDate("2015-10" +
-          "-25"), null, 25, null, 25,
-        "October", "GET", 10, 0, 25, 2015, 43, 25, null, null, 2015, 
"/linux/doing-pxe-without-dhcp-control", "195.154.46.135", null, null, 
"October", "/", "October", 10, null,
-        null, 25, new LocalTime("04:11:25"), "200", 43, "Mozilla/5.0 (Windows 
NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, 0, "GET 
/linux/doing-pxe-without-dhcp-control " +
-          "HTTP/1.1", "200", "/",
-        10, 25, "http://howto.basjes.nl/";, "http", "October", 24323, 
1445742685000L, 0, "/linux/doing-pxe-without-dhcp-control", null, new 
LocalDate("2015-10-25"), 1445742685000L,
-        0, 4, 3, 25, null, 2015, null, mapArray(), mapArray(), mapArray(), 
mapArray())
-      .build();
+    RowSet results = client.queryBuilder().sql(sql).rowSet();
 
-    new RowSetComparison(expected).verifyAndClearAll(results);
+    RowSet expectedRowSet = expectedAllFieldsRowSet(expectedSchema);
+    new RowSetComparison(expectedRowSet).verifyAndClearAll(results);
   }
 
   @Test
diff --git 
a/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReaderUserAgent.java
 
b/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReaderUserAgent.java
new file mode 100644
index 0000000..8b23efb
--- /dev/null
+++ 
b/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReaderUserAgent.java
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.httpd;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import java.nio.file.Paths;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+@Category(RowSetTests.class)
+public class TestHTTPDLogReaderUserAgent extends ClusterTest {
+
+  @BeforeClass
+  public static void setup() throws Exception {
+    ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
+
+    // Needed for compressed file unit test
+    dirTestWatcher.copyResourceToRoot(Paths.get("httpd/"));
+
+    defineHttpdPlugin();
+  }
+
+  private static void defineHttpdPlugin() {
+    Map<String, FormatPluginConfig> formats = new HashMap<>();
+    formats.put("multiformat", new HttpdLogFormatConfig(
+            Collections.singletonList("access_log"),
+            "combined" + '\n' +
+            "common" + '\n' +
+            "%h %l %u %t \"%r\" %s %b \"%{Referer}i\"" + '\n' +
+            "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"" + 
'\n' +
+            "%%%h %a %A %l %u %t \"%r\" %>s %b %p \"%q\" 
\"%!200,304,302{Referer}i\" %D " +
+            "\"%200{User-agent}i\" \"%{Cookie}i\" \"%{Set-Cookie}o\" 
\"%{If-None-Match}i\" \"%{Etag}o\"" + '\n',
+            null,
+            0,
+            true,
+            true,
+            null));
+
+    // Define a temporary plugin for the "cp" storage plugin.
+    cluster.defineFormats("cp", formats);
+  }
+
+  @Test
+  public void testMultiFormatUserAgent() throws RpcException {
+    String sql =
+            "SELECT                                                       " +
+            "          `request_receive_time_epoch`,                      " +
+            "          `request_user-agent`,                              " +
+            "          `request_user-agent_device__name`,                 " +
+            "          `request_user-agent_agent__name__version__major`   " +
+            "FROM   cp.`httpd/multiformat.access_log`                     ";
+    RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+    TupleMetadata expectedSchema = new SchemaBuilder()
+            .addNullable("request_receive_time_epoch",                      
MinorType.TIMESTAMP)
+            .addNullable("request_user-agent",                              
MinorType.VARCHAR)
+            .addNullable("request_user-agent_device__name",                 
MinorType.VARCHAR)
+            .addNullable("request_user-agent_agent__name__version__major",  
MinorType.VARCHAR)
+            .build();
+
+    RowSet expected = client.rowSetBuilder(expectedSchema)
+            .addRow(1_356_994_180_000L, "Mozilla/5.0 (X11; Linux i686 on 
x86_64; rv:11.0) Gecko/20100101 Firefox/11.0", "Linux Desktop", "Firefox 11")
+            .addRow(1_356_994_181_000L, "Mozilla/5.0 (Macintosh; Intel Mac OS 
X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 
Safari/537.36", "Apple Macintosh", "Chrome 66")
+            .addRow(1_388_530_181_000L, null, null, null) // This line in the 
input does not have the useragent field at all.
+            .build();
+
+    RowSetUtilities.verify(expected, results);
+  }
+
+  @Test
+  public void testUserAgentEnabled() throws Exception {
+    String sql =
+            "SELECT                                                            
   " +
+                    "          `request_receive_time_epoch`,                   
   " +
+                    "          `request_user-agent`,                           
   " +
+                    "          `request_user-agent_device__name`,              
   " +
+                    "          
`request_user-agent_agent__name__version__major`   " +
+                    "FROM       table(                                         
   " +
+                    "             cp.`httpd/typeremap.log`                     
   " +
+                    "                 (                                        
   " +
+                    "                   type => 'httpd',                       
   " +
+                    "                   logFormat => 'common\ncombined\n%h %l 
%u %t \"%r\" %>s %b %{RequestId}o\n',\n" +
+                    "                   flattenWildcards => true,              
   " +
+                    "                   parseUserAgent => true                 
   " +
+                    "                 )                                        
   " +
+                    "           )                                              
   ";
+
+    RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+    TupleMetadata expectedSchema = new SchemaBuilder()
+            .addNullable("request_receive_time_epoch",                      
MinorType.TIMESTAMP)
+            .addNullable("request_user-agent",                              
MinorType.VARCHAR)
+            .addNullable("request_user-agent_device__name",                 
MinorType.VARCHAR)
+            .addNullable("request_user-agent_agent__name__version__major",  
MinorType.VARCHAR)
+            .build();
+
+    RowSet expected = client.rowSetBuilder(expectedSchema)
+            .addRow(1_388_530_181_000L,
+                    "Mozilla/5.0 (compatible; Googlebot/2.1; Yauaa Bot/42.123; 
+https://yauaa.basjes.nl)", "Basjes Googlebot Imitator", "Yauaa Bot 42")
+            .build();
+
+    RowSetUtilities.verify(expected, results);
+  }
+
+  @Test
+  public void testUserAgentDisabled() throws Exception {
+    String sql =
+            "SELECT                                                            
   " +
+            "          `request_receive_time_epoch`,                      " +
+            "          `request_user-agent`,                              " +
+            "          `request_user-agent_device__name`,                 " +
+            "          `request_user-agent_agent__name__version__major`   " +
+            "FROM       table(                                            " +
+            "             cp.`httpd/typeremap.log`                        " +
+            "                 (                                           " +
+            "                   type => 'httpd',                          " +
+            "                   logFormat => 'common\ncombined\n%h %l %u %t 
\"%r\" %>s %b %{RequestId}o\n',\n" +
+            "                   flattenWildcards => true                  " +
+            "                 )                                           " +
+            "           )                                                 " +
+            "LIMIT 1                                                      ";
+
+    RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+    TupleMetadata expectedSchema = new SchemaBuilder()
+            .addNullable("request_receive_time_epoch",                      
MinorType.TIMESTAMP)
+            .addNullable("request_user-agent",                              
MinorType.VARCHAR)
+            .addNullable("request_user-agent_device__name",                 
MinorType.VARCHAR)
+            .addNullable("request_user-agent_agent__name__version__major",  
MinorType.VARCHAR)
+            .build();
+
+    RowSet expected = client.rowSetBuilder(expectedSchema)
+            .addRow(1_388_530_181_000L,
+                    "Mozilla/5.0 (compatible; Googlebot/2.1; Yauaa Bot/42.123; 
+https://yauaa.basjes.nl)", null, null)
+            .build();
+
+    RowSetUtilities.verify(expected, results);
+  }
+
+
+  @Test
+  public void testUserAgentAndTypeRemapping() throws Exception {
+    String sql =
+            "SELECT                                                            
               \n" +
+            "          `request_receive_time_epoch`                            
               \n" +
+            "        , `request_user-agent`                                    
               \n" +
+            "        , `request_user-agent_device__name`                       
               \n" +
+            "        , `request_user-agent_agent__name__version__major`        
               \n" +
+            "        , `request_firstline_uri_query_timestamp`                 
               \n" +
+            "        , `request_firstline_uri_query_ua`                        
               \n" +
+            "        , `request_firstline_uri_query_ua_device__name`           
               \n" +
+            "        , 
`request_firstline_uri_query_ua_agent__name__version__major`           \n" +
+            "        , `response_header_requestid_epoch`                       
               \n" +
+//            "        , *                                                     
                \n"+
+            "FROM       table(                                                 
               \n" +
+            "             cp.`httpd/typeremap.log`                             
               \n" +
+            "                 (                                                
               \n" +
+            "                   type => 'httpd',                               
               \n" +
+            //                  LogFormat: Mind the leading and trailing 
spaces! Empty lines are ignored
+            "                   logFormat => 'common\ncombined\n%h %l %u %t 
\"%r\" %>s %b %{RequestId}o\n',\n" +
+            "                   flattenWildcards => true,                      
               \n" +
+            "                   parseUserAgent => true,                        
               \n" +
+            "                   logParserRemapping => '                        
               \n" +
+            "                       request.firstline.uri.query.ua        
:HTTP.USERAGENT ;   \n" +
+            "                       response.header.requestid             
:MOD_UNIQUE_ID  ;   \n" +
+            "                       request.firstline.uri.query.timestamp 
:TIME.EPOCH : LONG  \n" +
+            "                   '                                              
               \n" +
+            "                 )                                                
               \n" +
+            "           )                                                      
               \n";
+
+    RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+    results.print();
+
+    TupleMetadata expectedSchema = new SchemaBuilder()
+            .addNullable("request_receive_time_epoch",                         
 MinorType.TIMESTAMP)
+            .addNullable("request_user-agent",                                 
 MinorType.VARCHAR)
+            .addNullable("request_user-agent_device__name",                    
 MinorType.VARCHAR)
+            .addNullable("request_user-agent_agent__name__version__major",     
 MinorType.VARCHAR)
+            .addNullable("request_firstline_uri_query_timestamp",              
 MinorType.TIMESTAMP)
+            .addNullable("request_firstline_uri_query_ua",                     
 MinorType.VARCHAR)
+            .addNullable("request_firstline_uri_query_ua_device__name",        
 MinorType.VARCHAR)
+            
.addNullable("request_firstline_uri_query_ua_agent__name__version__major", 
MinorType.VARCHAR)
+            .addNullable("response_header_requestid_epoch",                    
 MinorType.TIMESTAMP)
+            .build();
+
+    RowSet expected = client.rowSetBuilder(expectedSchema)
+            .addRow(// These are directly parsed from the line
+                    1_388_530_181_000L, // 2013-12-31T22:49:41.000Z
+                    "Mozilla/5.0 (compatible; Googlebot/2.1; Yauaa Bot/42.123; 
+https://yauaa.basjes.nl)",
+                    "Basjes Googlebot Imitator", "Yauaa Bot 42",
+
+                    // These are parsed by casting the query string parameters 
to something else
+                    1_607_506_430_621L, // 2020-12-09T09:33:50.621
+                    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36",
+                    "Apple Macintosh", "Chrome 66",
+
+                    null // No mod_unique_id field present
+            )
+            .addRow(// These are directly parsed from the line
+                    1_388_530_181_000L, // 2013-12-31T22:49:41.000Z
+                    null,               // The second line in the test file 
does not have a useragent field.
+                    null, null,
+
+                    // These are parsed by casting the query string parameters 
to something else
+                    1_607_506_430_621L, // 2020-12-09T09:33:50.621
+                    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3359.139 Safari/537.36",
+                    "Apple Macintosh", "Chrome 77",
+
+                    1_372_024_799_000L // 2013-06-23T21:59:59.000Z ==> The 
timestamp of the mod_unique_id value
+            )
+            .addRow(// These are directly parsed from the line
+                    1_388_530_181_000L, // 2013-12-31T22:49:41.000Z
+                    null,               // The second line in the test file 
does not have a useragent field.
+                    null, null,
+
+                    // These are parsed by casting the query string parameters 
to something else
+                    1_607_506_430_621L, // 2020-12-09T09:33:50.621
+                    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.3359.139 Safari/537.36",
+                    "Apple Macintosh", "Chrome 55",
+
+                    null // No mod_unique_id field present
+            )
+            .build();
+
+    RowSetUtilities.verify(expected, results);
+  }
+
+
+
+}
+
+
diff --git 
a/contrib/format-httpd/src/test/resources/httpd/multiformat.access_log 
b/contrib/format-httpd/src/test/resources/httpd/multiformat.access_log
new file mode 100644
index 0000000..ea5f198
--- /dev/null
+++ b/contrib/format-httpd/src/test/resources/httpd/multiformat.access_log
@@ -0,0 +1,3 @@
+%127.0.0.1 127.0.0.1 127.0.0.1 - - [31/Dec/2012:23:49:40 +0100] "GET 
/icons/powered_by_rh.png?aap=noot&res=1024x768 HTTP/1.1" 200 1213 80 "" 
"http://localhost/index.php?mies=wim"; 351 "Mozilla/5.0 (X11; Linux i686 on 
x86_64; rv:11.0) Gecko/20100101 Firefox/11.0" "jquery-ui-theme=Eggplant" 
"Apache=127.0.0.1.1344635380111339; path=/; domain=.basjes.nl" "-" 
"\"3780ff-4bd-4c1ce3df91380\""
+127.0.0.1 - - [31/Dec/2012:23:49:41 +0100] "GET /foo1 HTTP/1.1" 200 1213 
"http://localhost/index.php?mies=wim&test=true"; "Mozilla/5.0 (Macintosh; Intel 
Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 
Safari/537.36"
+127.0.0.1 - - [31/Dec/2013:23:49:41 +0100] "GET /foo2 HTTP/1.1" 200 1213 
"http://localhost/index.php?mies=zus&test=false";
diff --git a/contrib/format-httpd/src/test/resources/httpd/typeremap.log 
b/contrib/format-httpd/src/test/resources/httpd/typeremap.log
new file mode 100644
index 0000000..765bfde
--- /dev/null
+++ b/contrib/format-httpd/src/test/resources/httpd/typeremap.log
@@ -0,0 +1,3 @@
+127.0.0.1 - - [31/Dec/2013:23:49:41 +0100] "GET 
/something.php?ua=Mozilla/5.0%20(Macintosh;%20Intel%20Mac%20OS%20X%2010_12_3)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/66.0.3359.139%20Safari/537.36&timestamp=1607506430621
 HTTP/1.1" 200 1213 "http://localhost/index.php?mies=zus&test=false"; 
"Mozilla/5.0 (compatible; Googlebot/2.1; Yauaa Bot/42.123; 
+https://yauaa.basjes.nl)"
+127.0.0.1 - - [31/Dec/2013:23:49:41 +0100] "GET 
/something.php?ua=Mozilla/5.0%20(Macintosh;%20Intel%20Mac%20OS%20X%2010_12_3)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/77.0.3359.139%20Safari/537.36&timestamp=1607506430621
 HTTP/1.1" 200 1213 Ucdv38CoEJwAAEusp6EAAADz
+127.0.0.1 - - [31/Dec/2013:23:49:41 +0100] "GET 
/something.php?ua=Mozilla/5.0%20(Macintosh;%20Intel%20Mac%20OS%20X%2010_12_3)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/55.0.3359.139%20Safari/537.36&timestamp=1607506430621
 HTTP/1.1" 200 1213
diff --git a/contrib/format-httpd/src/test/resources/logback-test.txt 
b/contrib/format-httpd/src/test/resources/logback-test.txt
index 2adcf81..e26ec99 100644
--- a/contrib/format-httpd/src/test/resources/logback-test.txt
+++ b/contrib/format-httpd/src/test/resources/logback-test.txt
@@ -61,5 +61,8 @@
     <level value="debug" />
     <appender-ref ref="STDOUT" />
   </logger>
-
+  <logger name="nl.basjes.parse" additivity="false">
+    <level value="info" />
+    <appender-ref ref="STDOUT" />
+  </logger>
 </configuration>
\ No newline at end of file
diff --git a/contrib/udfs/pom.xml b/contrib/udfs/pom.xml
index a220005..93d0a91 100644
--- a/contrib/udfs/pom.xml
+++ b/contrib/udfs/pom.xml
@@ -63,10 +63,11 @@
       <artifactId>proj4j</artifactId>
       <version>0.1.0</version>
     </dependency>
+
     <dependency>
       <groupId>nl.basjes.parse.useragent</groupId>
       <artifactId>yauaa</artifactId>
-      <version>5.19</version>
+      <version>${yauaa.version}</version>
     </dependency>
 
     <!-- Test dependencies -->
diff --git a/exec/java-exec/pom.xml b/exec/java-exec/pom.xml
index 8417906..43f73e5 100644
--- a/exec/java-exec/pom.xml
+++ b/exec/java-exec/pom.xml
@@ -529,7 +529,7 @@
     <dependency>
       <groupId>nl.basjes.parse.httpdlog</groupId>
       <artifactId>httpdlog-parser</artifactId>
-      <version>5.3</version>
+      <version>${httpdlog-parser.version}</version>
       <exclusions>
         <exclusion>
           <groupId>commons-codec</groupId>
diff --git a/pom.xml b/pom.xml
index 0a98738..32c6105 100644
--- a/pom.xml
+++ b/pom.xml
@@ -127,6 +127,8 @@
     <xerces.version>2.12.0</xerces.version>
     <commons.configuration.version>1.10</commons.configuration.version>
     <commons.beanutils.version>1.9.4</commons.beanutils.version>
+    <httpdlog-parser.version>5.7</httpdlog-parser.version>
+    <yauaa.version>5.20</yauaa.version>
   </properties>
 
   <scm>
@@ -408,6 +410,7 @@
             <exclude>**/ssl/*.p12</exclude>
             <exclude>**/*.tbl</exclude>
             <exclude>**/*.httpd</exclude>
+            <exclude>**/*.access_log</exclude>
             <exclude>**/*.autotools</exclude>
             <exclude>**/*.cproject</exclude>
             <exclude>**/*.drill</exclude>

Reply via email to