clintropolis opened a new pull request, #16300:
URL: https://github.com/apache/druid/pull/16300

   ### Description
   This PR switches the `OR` filter "partial index" value matchers to use 
`PeekableIntIterator` instead of `IntIterator` which can dramatically improve 
performance when these are used on top of an index offset which has a small 
number of set bits (since we can use `advanceIfNeeded` instead of looping to 
check `next` until we seek the correct offset). These matchers are used when 
some sub-filters support indexes we make a synthetic value matcher that checks 
the index instead of actually evaluating the matchers, which can be quite 
beneficial, but in certain scenarios can also be rather expensive.
   
   
   The added benchmark is one such expensive scenario, where the OR filter is 
nested under an `AND` filter. The equality clause of the AND is rather 
selective, but the first clause of the `OR` matches all rows, so the while loop 
of the previous code needs to call `hasNext`/`next` quite a lot to seek to the 
next offset.
   
   before:
   ```
   Benchmark              (query)  (rowsPerSegment)  (schema)  (storageType)  
(stringEncoding)  (vectorize)  Mode  Cnt   Score    Error  Units
   SqlBenchmark.querySql       41           5000000  explicit           mmap    
          none        false  avgt    5  46.568 ± 18.771  ms/op
   SqlBenchmark.querySql       41           5000000  explicit           mmap    
          none        force  avgt    5  13.940 ±  1.099  ms/op
   ```
   
   after:
   ```
   Benchmark              (query)  (rowsPerSegment)  (schema)  (storageType)  
(stringEncoding)  (vectorize)  Mode  Cnt  Score   Error  Units
   SqlBenchmark.querySql       41           5000000  explicit           mmap    
          none        false  avgt    5  2.877 ± 0.347  ms/op
   SqlBenchmark.querySql       41           5000000  explicit           mmap    
          none        force  avgt    5  3.841 ± 1.413  ms/op
   ```
   
   I should've known to do this in the first place given #8822 and done this as 
part of #15838, but the second best time is today or something.
   
   
   <hr>
   
   
   This PR has:
   
   - [x] been self-reviewed.
   - [ ] a release note entry in the PR description.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to