[
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011170#comment-15011170
]
Jim Scott commented on DRILL-3423:
----------------------------------
Jacques,
Given everything you have said here I can see value in making some changes. I
think that in order to move in that direction however, there are a considerable
number of details not yet covered. I have tried to get them all below. I agree
on the ideas of the functions and have put those which you suggested here in
addition to others that would need to be covered. However, I would say that
these issues must be resolved in order to move in this direction.
h3. Considerations
h4. User must specify a name that drill understands, and that can be mapped
into a name the parser understands
*_Option_* -- There needs to be a mapping between every format string available
for the user to be able to query that field (see table of mappings -- user will
reference with underscore and not dots).
|| Format String || Variable Name || Type ||
| %a | connection.client.ip | IP |
| %\{c}a | connection.client.peerip | IP |
| %A | connection.server.ip | IP |
| %B | response.body.bytes | BYTES |
| %b | response.body.bytesclf | BYTES |
| %\{Foobar}C | request.cookies.* | HTTP.COOKIE |
| %D | server.process.time | MICROSECONDS |
| %\{Foobar}e | server.environment.* | VARIABLE |
| %f | server.filename | FILENAME |
| %h | connection.client.host | IP |
| %H | request.protocol | PROTOCOL |
| %\{Foobar}i | request.header. | HTTP.HEADER |
| %k | connection.keepalivecount | NUMBER |
| %l | connection.client.logname | NUMBER |
| %L | request.errorlogid | STRING |
| %m | request.method | HTTP.METHOD |
| %\{Foobar}n | server.module_note.* | STRING |
| %\{Foobar}o | response.header.* | HTTP.HEADER |
| %p | request.server.port.canonical | PORT |
| %\{canonical}p | connection.server.port.canonical | PORT |
| %\{local}p | connection.server.port | PORT |
| %\{remote}p | connection.client.port | PORT |
| %P | connection.server.child.processid | NUMBER |
| %\{pid}P | connection.server.child.processid | NUMBER |
| %\{tid}P | connection.server.child.threadid | NUMBER |
| %\{hextid}P | connection.server.child.hexthreadid | NUMBER |
| %q | request.querystring | HTTP.QUERYSTRING |
| %r | request.firstline | HTTP.FIRSTLINE |
| %R | request.handler | STRING |
| %s | request.status.original | STRING |
| %>s | request.status.last | STRING |
| %t | request.receive.time | TIME.STAMP |
| %\{msec}t | request.receive.time.begin.msec | TIME.EPOCH |
| %\{begin:msec}t | request.receive.time.begin.msec | TIME.EPOCH |
| %\{end:msec}t | request.receive.time.end.msec | TIME.EPOCH |
| %\{usec}t | request.receive.time.begin.usec | TIME.EPOCH.USEC |
| %\{begin:usec}t | request.receive.time.begin.usec | TIME.EPOCH.USEC |
| %\{end:usec}t | request.receive.time.end.usec | TIME.EPOCH.USEC |
| %\{msec_frac}t | request.receive.time.begin.msec_frac | TIME.EPOCH |
| %\{begin:msec_frac}t | request.receive.time.begin.msec_frac | TIME.EPOCH |
| %\{end:msec_frac}t | request.receive.time.end.msec_frac | TIME.EPOCH |
| %\{usec_frac}t | request.receive.time.begin.usec_frac | TIME.EPOCH.USEC_FRAC |
| %\{begin:usec_frac}t | request.receive.time.begin.usec_frac |
TIME.EPOCH.USEC_FRAC |
| %\{end:usec_frac}t | request.receive.time.end.usec_frac |
TIME.EPOCH.USEC_FRAC |
| %T | response.server.processing.time | SECONDS |
| %u | connection.client.user | STRING |
| %U | request.urlpath | URI |
| %v | connection.server.name.canonical | STRING |
| %V | connection.server.name | STRING |
| %X | response.connection.status | HTTP.CONNECTSTATUS |
| %I | request.bytes | BYTES |
| %O | response.bytes | BYTES |
| %\{cookie}i | request.cookies | HTTP.COOKIES |
| %\{set-cookie}o | response.cookies | HTTP.SETCOOKIES |
| %\{user-agent}i | request.user-agent | HTTP.USERAGENT |
| %\{referer}i | request.referer | HTTP.URI |
h4. There are fields which could be parsed and selected by the user that are
complex (URL, URI, query string)
*_Option_* -- Provide a function to parse urls into map
{code}
{
protocol: "...",
user: "...",
password: "...",
host: "...",
port: "...",
path: "...",
query: "...",
fragment: "..."
}
{code}
*_Option_* -- Provide a function to parse a query string into (users can use
kvgen on this if they need to)
{code}
{
"fieldName1": "fieldValue1",
"fieldName2": "fieldValue2",
...
}
{code}
h4. There are fields which could be parsed and selected by the user that are
arbitrary (cookies, headers, etc..)
*_Option_* -- Cookies are named and contain (domain, expires, path, value)
{code}
[
name: {
domain: "...",
expires: "...",
path: "...",
value: "..."
},
...
]
{code}
*_Issue to Address_*
There are details in the string format represented by Foobar (e.g. header
names) that cannot necessarily be identified before hand and must be accounted
for or else the parser won't be completely effective and the user will not be
able to query headers, etc... that exist in the log.
h4. Other Possible Issues
Who is going to write the functions to expose the functionality for all Drill
queries?
> Add New HTTPD format plugin
> ---------------------------
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
> Issue Type: New Feature
> Components: Storage - Other
> Reporter: Jacques Nadeau
> Assignee: Jim Scott
> Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin. The author has been kind enough
> to move the logparser project to be released under the Apache License. Can
> find it here:
> <dependency>
> <groupId>nl.basjes.parse.httpdlog</groupId>
> <artifactId>httpdlog-parser</artifactId>
> <version>2.0</version>
> </dependency>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)