nielsbasjes commented on pull request #2112:
URL: https://github.com/apache/drill/pull/2112#issuecomment-727902776


   As an experiment I added this to your code:
   ```
   diff --git a/contrib/format-httpd/pom.xml b/contrib/format-httpd/pom.xml
   index 10a9e35b4..02ae984ac 100644
   --- a/contrib/format-httpd/pom.xml
   +++ b/contrib/format-httpd/pom.xml
   @@ -51,6 +51,12 @@
          </exclusions>
        </dependency>
    
   +    <dependency>
   +      <groupId>nl.basjes.parse.useragent</groupId>
   +      <artifactId>yauaa-logparser</artifactId>
   +      <version>${yauaa.version}</version>
   +    </dependency>
   +
        <!-- Test dependencies -->
        <dependency>
          <groupId>org.apache.drill.exec</groupId>
   diff --git 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
   index 326a074d1..8a0f23063 100644
   --- 
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
   +++ 
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
   @@ -17,6 +17,7 @@
     */
    package org.apache.drill.exec.store.httpd;
    
   +import nl.basjes.parse.useragent.dissector.UserAgentDissector;
    import org.apache.drill.common.expression.SchemaPath;
    import org.apache.drill.common.types.TypeProtos;
    import org.apache.drill.common.types.TypeProtos.MinorType;
   @@ -67,6 +68,7 @@ public class HttpdParser {
        } else {
          this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, 
logFormat, timestampFormat);
        }
   +    this.parser.addDissector(new UserAgentDissector());
        this.requestedColumns = scan.getColumns();
    
        if (timestampFormat != null && !timestampFormat.trim().isEmpty()) {
   @@ -119,6 +121,7 @@ public class HttpdParser {
         * because this will be the slowest parsing path possible for the 
specified format.
         */
        Parser<Object> dummy = new HttpdLoglineParser<>(Object.class, 
logFormat);
   +    dummy.addDissector(new UserAgentDissector());
        dummy.addParseTarget(String.class.getMethod("indexOf", String.class), 
allParserPaths);
    
        for (final Map.Entry<String, String> entry : requestedPaths.entrySet()) 
{
   ```
   
   Now I ran into something strange.
   When I run this test code:
   ```
       String sql = "SELECT `request_user-agent`, 
`request_user-agent_device__name`, 
`request_user-agent_agent__name__version__major` FROM 
cp.`httpd/hackers-access-small.httpd` LIMIT 1";
       RowSet results = client.queryBuilder().sql(sql).rowSet();
       results.print();
   ```
   
   I see this
   ```
   #: `request_user-agent` VARCHAR, `request_user-agent_device__name` VARCHAR, 
`request_user-agent_agent__name__version__major` VARCHAR
   0: "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 
"DesktopDesktop", "Firefox 35Firefox 35"
   ```
   
   At this moment I think this is a bug in the Yauaa Dissector.
   I'm digging into this.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to