[GitHub] [druid] clintropolis commented on pull request #12277: add support for 'front coded' string dictionaries for smaller string columns

GitBox Mon, 26 Sep 2022 15:08:22 -0700


clintropolis commented on PR #12277:
URL: https://github.com/apache/druid/pull/12277#issuecomment-1258693747


   >We need to add some query benchmarks that include more realistic filter 
lists. Likely, we should have one query with an IN filter with 100 entries, 
another with 1000 entries and a third with 10,000 entries. If we want to vary 
it, we should run them each with different "hit ratios", but for now, we can 
perhaps give them a 20% overlap with actual values that exist in the 
dictionaries. If we do that, I suspect that we will start exercising some of 
the code paths that I'm curious about.
   
   This is currently sort of provided by the `FrontCodedIndexedBenchmark`, 
which i think is the most appropriate place to measure that directly, though I 
only did 10k entries and didn't spend much time with adding non-existent values 
since I wasn't targeting specifically the in filter. I don't think the general 
SQL benchmark is the most appropriate place to go that in depth on specific 
filters, and in my experience, most of the `IN` filters with 1k+ entries I have 
seen are usually machine generated native queries rather than SQL, since the 
overhead of parsing that much text into also starts to stick out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] clintropolis commented on pull request #12277: add support for 'front coded' string dictionaries for smaller string columns

Reply via email to