nielsbasjes commented on pull request #2112:
URL: https://github.com/apache/drill/pull/2112#issuecomment-727902776
As an experiment I added this to your code:
```
diff --git a/contrib/format-httpd/pom.xml b/contrib/format-httpd/pom.xml
index 10a9e35b4..02ae984ac 100644
--- a/contrib/format-httpd/pom.xml
+++ b/contrib/format-httpd/pom.xml
@@ -51,6 +51,12 @@
</exclusions>
</dependency>
+ <dependency>
+ <groupId>nl.basjes.parse.useragent</groupId>
+ <artifactId>yauaa-logparser</artifactId>
+ <version>${yauaa.version}</version>
+ </dependency>
+
<!-- Test dependencies -->
<dependency>
<groupId>org.apache.drill.exec</groupId>
diff --git
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
index 326a074d1..8a0f23063 100644
---
a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
+++
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
@@ -17,6 +17,7 @@
*/
package org.apache.drill.exec.store.httpd;
+import nl.basjes.parse.useragent.dissector.UserAgentDissector;
import org.apache.drill.common.expression.SchemaPath;
import org.apache.drill.common.types.TypeProtos;
import org.apache.drill.common.types.TypeProtos.MinorType;
@@ -67,6 +68,7 @@ public class HttpdParser {
} else {
this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class,
logFormat, timestampFormat);
}
+ this.parser.addDissector(new UserAgentDissector());
this.requestedColumns = scan.getColumns();
if (timestampFormat != null && !timestampFormat.trim().isEmpty()) {
@@ -119,6 +121,7 @@ public class HttpdParser {
* because this will be the slowest parsing path possible for the
specified format.
*/
Parser<Object> dummy = new HttpdLoglineParser<>(Object.class,
logFormat);
+ dummy.addDissector(new UserAgentDissector());
dummy.addParseTarget(String.class.getMethod("indexOf", String.class),
allParserPaths);
for (final Map.Entry<String, String> entry : requestedPaths.entrySet())
{
```
Now I ran into something strange.
When I run this test code:
```
String sql = "SELECT `request_user-agent`,
`request_user-agent_device__name`,
`request_user-agent_agent__name__version__major` FROM
cp.`httpd/hackers-access-small.httpd` LIMIT 1";
RowSet results = client.queryBuilder().sql(sql).rowSet();
results.print();
```
I see this
```
#: `request_user-agent` VARCHAR, `request_user-agent_device__name` VARCHAR,
`request_user-agent_agent__name__version__major` VARCHAR
0: "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0",
"DesktopDesktop", "Firefox 35Firefox 35"
```
At this moment I think this is a bug in the Yauaa Dissector.
I'm digging into this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]