gianm opened a new pull request, #15832: URL: https://github.com/apache/druid/pull/15832
If lots of keys map to the same value, reversing a LOOKUP call can slow things down unacceptably. To protect against this, this patch introduces a parameter sqlReverseLookupThreshold representing the maximum size of an IN filter that will be created as part of lookup reversal. If inSubQueryThreshold is set to a smaller value than sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead. This allows users to use that single parameter to control IN sizes if they wish. Benchmarks follow. I chose `10000` as the default for `sqlReverseLookupThreshold` since it keeps planning time under 1 second. Future work to speed up IN filters could allow us to raise the default threshold. ``` Benchmark (keysPerValue) (lookupType) (numKeys) Mode Cnt Score Error Units SqlReverseLookupBenchmark.planEquals 1000 hashmap 5000000 avgt 5 163.002 ± 4.228 ms/op SqlReverseLookupBenchmark.planEquals 1000 immutable 5000000 avgt 5 43.095 ± 2.864 ms/op SqlReverseLookupBenchmark.planEquals 10000 hashmap 5000000 avgt 5 734.592 ± 34.374 ms/op SqlReverseLookupBenchmark.planEquals 10000 immutable 5000000 avgt 5 555.980 ± 49.903 ms/op SqlReverseLookupBenchmark.planEquals 100000 hashmap 5000000 avgt 5 8545.459 ± 108.931 ms/op SqlReverseLookupBenchmark.planEquals 100000 immutable 5000000 avgt 5 8415.105 ± 116.926 ms/op SqlReverseLookupBenchmark.planNotEquals 1000 hashmap 5000000 avgt 5 257.995 ± 5.576 ms/op SqlReverseLookupBenchmark.planNotEquals 1000 immutable 5000000 avgt 5 41.088 ± 1.582 ms/op SqlReverseLookupBenchmark.planNotEquals 10000 hashmap 5000000 avgt 5 776.826 ± 8.265 ms/op SqlReverseLookupBenchmark.planNotEquals 10000 immutable 5000000 avgt 5 583.022 ± 19.766 ms/op SqlReverseLookupBenchmark.planNotEquals 100000 hashmap 5000000 avgt 5 9019.350 ± 144.835 ms/op SqlReverseLookupBenchmark.planNotEquals 100000 immutable 5000000 avgt 5 8754.859 ± 429.341 ms/op ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
