nielsbasjes commented on pull request #2112:
URL: https://github.com/apache/drill/pull/2112#issuecomment-730235084


   I did some testing and found something worth discussing regarding the 
wildcards.
   
   _Note about all of these points; I'm fine with just putting a bit of 
documentation in place that describes these as known limitations._
   
   When I do a  "select *" from a table backed by this format and I print the 
result set I get for "wildcard" scenarios like the query parameters and the 
cookies options like these:
   ```
   `response_cookies_$` STRUCT<`apache` VARCHAR>,
   `request_firstline_uri_query_$` STRUCT<`aap` VARCHAR, `res` VARCHAR>,
   ```
   
   The first thing I noticed is that the actual values in the data are 
reflected in the header. I assume this is just the way the RowSet::print() 
works. Do note that if you have a large variety of query parameters in your 
dataset this may become a big list.  
   
   What I find is that these wildcards do not work as I expected when comparing 
what the underlying parser does.
   
   
   Assuming the URI `/icons/powered_by_rh.png?aap=noot&res=1024x768`
   
   When I ask for `request_firstline_uri_query_$` I see in the output something 
that looks like what I expect `{"noot", "1024x768"}`
   However when I directly try to query a deeper entry like 
`request_firstline_uri_query_aap` I consistently see a `null` value.
   
   This "explicit" way of asking for a values is there because now the system 
does not need to url decode the "unwanted" fields (i.e. there is a bit of 
performance impact if there are a lot of unwanted fields (query parameters / 
cookies) in the line at hand.
   
   Note that the underlying parser does support this; the example for Apache 
Pig makes this the most clear:
   
https://github.com/nielsbasjes/logparser/blob/master/examples/apache-pig/src/main/pig/demo.pig#L34
   
   Now the response cookies are special because they have limited support for a 
wildcard in the middle:
   ```
   `response_cookies_$_comment` VARCHAR,
   `response_cookies_$_domain` VARCHAR,
   `response_cookies_$_expires` TIMESTAMP,
   `response_cookies_$_path` VARCHAR,
   `response_cookies_$_value` VARCHAR,
   ```
   See 
https://github.com/nielsbasjes/logparser/blob/master/httpdlog/httpdlog-parser/src/test/java/nl/basjes/parse/httpdlog/ApacheHttpdLogParserTest.java#L161
   
   These are intended so you can ask for something 
like`STRING:response.cookies.jsessionid.path`
   
   Here I found that these seem to always return a null also.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to