cgivre opened a new pull request #2112: URL: https://github.com/apache/drill/pull/2112
# [DRILL-7534](https://issues.apache.org/jira/browse/DRILL-7534): Convert HTTPD Format Plugin to EVF ## Description This PR updates the HTTPD format plugin to use the Enhanced Vector Framework (EVF). In theory there are few changes a user might notice. 1. A new configuration option `maxErrors` has been added which will allow a user to tune how fault tolerant they want Drill to be when reading log files. 2. Two new implicit fields have been added, `_raw` and `_matched`. They are described in the docs below. 3. The plugin now includes a limit pushdown which significantly improves query times for queries with limits. 4. The plugin code is now in the `contrib` folder. In addition, this PR updates the associated User Agent parsing functions with the latest version of the underlying libraries. ## Documentation # Web Server Log Format Plugin (HTTPD) This plugin enables Drill to read and query httpd (Apache Web Server) and nginx logs natively. This plugin uses the work by [Niels Basjes](https://github.com/nielsbasjes) which is available here: https://github.com/nielsbasjes/logparser. ## Configuration There are three fields which you will need to configure in order for Drill to read web server logs which are: * **`logFormat`**: The log format string is the format string found in your web server configuration. * **`timestampFormat`**: The format of time stamps in your log files. * **`extensions`**: The file extension of your web server logs. * **`maxErrors`**: Sets the plugin error tolerence. When set to any value less than `0`, Drill will ignore all errors. ```json "httpd" : { "type" : "httpd", "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"", "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ", "maxErrors": 0 } ``` ### Implicit Columns Data queried by this plugin will return two implicit columns: * **`_raw`**: This returns the raw, unparsed log line * **`_matched`**: Returns `true` or `false` depending on whether the line matched the config string. Thus, if you wanted to see which lines in your log file were not matching the config, you could use the following query: ```sql SELECT _raw FROM <data> WHERE _matched = false ``` ## Testing Added additional unit tests for this plugin. Ran all unit tests for the `parse_user_agent()` UDF as well. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
