cgivre commented on pull request #2112:
URL: https://github.com/apache/drill/pull/2112#issuecomment-727583315
@nielsbasjes
Thanks for the quick review! I have a few more things to tweak before it's
ready for the next round of review, but I've noticed a very significant
improvement in query performance on my machine with the log files with the
refactoring. Did you see any difference?
I have a question for you regarding the file extension. Right now, Drill
uses the file extension to determine which format plugin to use for parsing the
file(s). One other option that Drill has is the `defaultInputFormat` which is
an option for a given workspace. I'd imagine that in a real world situation,
web server logs would be contained as they are generated in a directory or
series of directories. What you could do in that case, is define a workspace
and set the `defaultInputFormat` to httpd and that would tell Drill to use the
HTTPD plugin even when there are no file extensions specified.
```json
"weblogs": {
"location": "<path to logs>",
"writable": false,
"defaultInputFormat": "httpd"
}
```
With that said, I do like the idea of allowing users to define a pattern for
filenames that would be associated with a particular file type. I think that
might be out of scope for this PR however.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]