[
https://issues.apache.org/jira/browse/DRILL-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247347#comment-17247347
]
ASF GitHub Bot commented on DRILL-7817:
---------------------------------------
nielsbasjes commented on a change in pull request #2122:
URL: https://github.com/apache/drill/pull/2122#discussion_r540301492
##########
File path: contrib/format-httpd/README.md
##########
@@ -1,35 +1,39 @@
# Web Server Log Format Plugin (HTTPD)
This plugin enables Drill to read and query httpd (Apache Web Server) and
nginx access logs natively. This plugin uses the work by [Niels
Basjes](https://github.com/nielsbasjes
-) which is available here: https://github.com/nielsbasjes/logparser.
+) which is available here: https://github.com/nielsbasjes/logparser .
## Configuration
-There are five fields which you can to configure in order for Drill to read
web server logs. In general the defaults should be fine, however the fields
are:
+There are several fields which you can specify in order for Drill to read web
server logs. In general the defaults should be fine, however the fields are:
* **`logFormat`**: The log format string is the format string found in your
web server configuration. If you have multiple logFormats then you can add all
of them in this
single parameter separated by a newline (`\n`). The parser will automatically
select the first matching format.
+ Note that the well known formats `common`, `combined`, `combinedio`,
`referer` and `agent` are also accepted as logFormat.
Review comment:
Here I already put some documentation about the builtin formats. Is this
enough?
##########
File path: contrib/format-httpd/README.md
##########
@@ -1,35 +1,39 @@
# Web Server Log Format Plugin (HTTPD)
This plugin enables Drill to read and query httpd (Apache Web Server) and
nginx access logs natively. This plugin uses the work by [Niels
Basjes](https://github.com/nielsbasjes
-) which is available here: https://github.com/nielsbasjes/logparser.
+) which is available here: https://github.com/nielsbasjes/logparser .
Review comment:
Yes, sorry. I thought it would break the link.
##########
File path:
contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
##########
@@ -35,45 +36,61 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
+import java.util.ArrayList;
import java.util.EnumSet;
-import java.util.HashMap;
import java.util.List;
import java.util.Map;
+import java.util.TreeMap;
+
+import static nl.basjes.parse.core.Casts.DOUBLE;
+import static nl.basjes.parse.core.Casts.DOUBLE_ONLY;
+import static nl.basjes.parse.core.Casts.LONG;
+import static nl.basjes.parse.core.Casts.LONG_ONLY;
+import static nl.basjes.parse.core.Casts.STRING;
+import static nl.basjes.parse.core.Casts.STRING_ONLY;
public class HttpdParser {
private static final Logger logger =
LoggerFactory.getLogger(HttpdParser.class);
public static final String PARSER_WILDCARD = ".*";
- public static final String REMAPPING_FLAG = "#";
private final Parser<HttpdLogRecord> parser;
private final List<SchemaPath> requestedColumns;
private final Map<String, MinorType> mappedColumns;
+ private final Map<String, Casts> columnCasts;
private final HttpdLogRecord record;
private final String logFormat;
+ private final boolean parseUserAgent;
+ private final String logParserRemapping;
private Map<String, String> requestedPaths;
- private EnumSet<Casts> casts;
-
- public HttpdParser(final String logFormat, final String timestampFormat,
final boolean flattenWildcards, final EasySubScan scan) {
+ public HttpdParser(
+ final String logFormat,
+ final String timestampFormat,
+ final boolean flattenWildcards,
+ final boolean parseUserAgent,
+ final String logParserRemapping,
+ final EasySubScan scan) {
Preconditions.checkArgument(logFormat != null &&
!logFormat.trim().isEmpty(), "logFormat cannot be null or empty");
this.logFormat = logFormat;
+ this.parseUserAgent = parseUserAgent;
this.record = new HttpdLogRecord(timestampFormat, flattenWildcards);
- if (timestampFormat == null) {
- this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, logFormat);
- } else {
- this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, logFormat,
timestampFormat);
Review comment:
No, the parser already handles this case:
https://github.com/nielsbasjes/logparser/blob/master/httpdlog/httpdlog-parser/src/main/java/nl/basjes/parse/httpdlog/dissectors/TimeStampDissector.java#L65
##########
File path: contrib/format-httpd/src/main/resources/bootstrap-format-plugins.json
##########
@@ -5,7 +5,7 @@
"formats": {
"httpd" : {
"type" : "httpd",
- "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\"
\"%{User-agent}i\"",
+ "logFormat" : "common\ncombined",
Review comment:
I already did that. Please check if it is enough documentation (and
clear enough for others).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add direct Yauaa support for HTTPD Format Plugin.
> -------------------------------------------------
>
> Key: DRILL-7817
> URL: https://issues.apache.org/jira/browse/DRILL-7817
> Project: Apache Drill
> Issue Type: New Feature
> Reporter: Niels Basjes
> Assignee: Niels Basjes
> Priority: Minor
>
> Enhancement of having the Yauaa useragent parser immediately integrated with
> the HTTPD logparser.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)