clintropolis opened a new pull request, #16300:
URL: https://github.com/apache/druid/pull/16300
### Description
This PR switches the `OR` filter "partial index" value matchers to use
`PeekableIntIterator` instead of `IntIterator` which can dramatically improve
performance when these are used on top of an index offset which has a small
number of set bits (since we can use `advanceIfNeeded` instead of looping to
check `next` until we seek the correct offset). These matchers are used when
some sub-filters support indexes we make a synthetic value matcher that checks
the index instead of actually evaluating the matchers, which can be quite
beneficial, but in certain scenarios can also be rather expensive.
The added benchmark is one such expensive scenario, where the OR filter is
nested under an `AND` filter. The equality clause of the AND is rather
selective, but the first clause of the `OR` matches all rows, so the while loop
of the previous code needs to call `hasNext`/`next` quite a lot to seek to the
next offset.
before:
```
Benchmark (query) (rowsPerSegment) (schema) (storageType)
(stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 41 5000000 explicit mmap
none false avgt 5 46.568 ± 18.771 ms/op
SqlBenchmark.querySql 41 5000000 explicit mmap
none force avgt 5 13.940 ± 1.099 ms/op
```
after:
```
Benchmark (query) (rowsPerSegment) (schema) (storageType)
(stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 41 5000000 explicit mmap
none false avgt 5 2.877 ± 0.347 ms/op
SqlBenchmark.querySql 41 5000000 explicit mmap
none force avgt 5 3.841 ± 1.413 ms/op
```
I should've known to do this in the first place given #8822 and done this as
part of #15838, but the second best time is today or something.
<hr>
This PR has:
- [x] been self-reviewed.
- [ ] a release note entry in the PR description.
- [x] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [x] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]