clintropolis commented on PR #12277: URL: https://github.com/apache/druid/pull/12277#issuecomment-1258693747
>We need to add some query benchmarks that include more realistic filter lists. Likely, we should have one query with an IN filter with 100 entries, another with 1000 entries and a third with 10,000 entries. If we want to vary it, we should run them each with different "hit ratios", but for now, we can perhaps give them a 20% overlap with actual values that exist in the dictionaries. If we do that, I suspect that we will start exercising some of the code paths that I'm curious about. This is currently sort of provided by the `FrontCodedIndexedBenchmark`, which i think is the most appropriate place to measure that directly, though I only did 10k entries and didn't spend much time with adding non-existent values since I wasn't targeting specifically the in filter. I don't think the general SQL benchmark is the most appropriate place to go that in depth on specific filters, and in my experience, most of the `IN` filters with 1k+ entries I have seen are usually machine generated native queries rather than SQL, since the overhead of parsing that much text into also starts to stick out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
