clintropolis opened a new pull request, #19023:
URL: https://github.com/apache/druid/pull/19023

   changes:
   * Added `getValueIterator` method to `DictionaryEncodedValueIndex` to give 
an easy way for consumers to iterate the dictionary values in order
   * `ExpressionPredicateIndexSupplier` now uses `getValueIterator` to scan the 
dictionary values, offering a performance improvement, particularly when using 
front-coding
   * fixed a few other places that were iterating the dictionary using get to 
use iterator instead
   
   Credit to #19004 for the added benchmark query and bringing this issue to 
attention, where when using front-coding it was causing computing the indexes 
to be slower than just doing a full scan (at least in some cases, such as this 
query)
   
   before:
   ```
   Benchmark                        (complexCompression)  
(deferExpressionDimensions)  (jsonObjectStorageEncoding)  (query)  
(rowsPerSegment)  (schemaType)  (storageType)   (stringEncoding)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE       61           1500000      
explicit           MMAP  FRONT_CODED_16_V1        false  avgt    5  522.387 ± 
22.942  ms/op
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE       61           1500000      
explicit           MMAP  FRONT_CODED_16_V1        force  avgt    5  501.122 ± 
17.559  ms/op
   SqlExpressionBenchmark.querySql                  NONE                   
fixedWidth                        SMILE       61           1500000      
explicit           MMAP  FRONT_CODED_16_V1        false  avgt    5  547.506 ± 
15.055  ms/op
   SqlExpressionBenchmark.querySql                  NONE                   
fixedWidth                        SMILE       61           1500000      
explicit           MMAP  FRONT_CODED_16_V1        force  avgt    5  446.650 ±  
5.308  ms/op
   SqlExpressionBenchmark.querySql                  NONE         
fixedWidthNonNumeric                        SMILE       61           1500000    
  explicit           MMAP  FRONT_CODED_16_V1        false  avgt    5  572.099 ± 
67.823  ms/op
   SqlExpressionBenchmark.querySql                  NONE         
fixedWidthNonNumeric                        SMILE       61           1500000    
  explicit           MMAP  FRONT_CODED_16_V1        force  avgt    5  499.534 ± 
19.926  ms/op
   SqlExpressionBenchmark.querySql                  NONE                       
always                        SMILE       61           1500000      explicit    
       MMAP  FRONT_CODED_16_V1        false  avgt    5  549.607 ± 25.846  ms/op
   SqlExpressionBenchmark.querySql                  NONE                       
always                        SMILE       61           1500000      explicit    
       MMAP  FRONT_CODED_16_V1        force  avgt    5  496.660 ± 16.439  ms/op
   ```
   
   after:
   ```
   Segment)  (schemaType)  (storageType)   (stringEncoding)  (vectorize)  Mode  
Cnt    Score     Error  Units
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE       61           1500000      
explicit           MMAP  FRONT_CODED_16_V1        false  avgt    5  428.333 ±  
14.320  ms/op
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE       61           1500000      
explicit           MMAP  FRONT_CODED_16_V1        force  avgt    5  364.073 ±   
5.671  ms/op
   SqlExpressionBenchmark.querySql                  NONE                   
fixedWidth                        SMILE       61           1500000      
explicit           MMAP  FRONT_CODED_16_V1        false  avgt    5  423.951 ±  
12.710  ms/op
   SqlExpressionBenchmark.querySql                  NONE                   
fixedWidth                        SMILE       61           1500000      
explicit           MMAP  FRONT_CODED_16_V1        force  avgt    5  371.926 ±   
5.133  ms/op
   SqlExpressionBenchmark.querySql                  NONE         
fixedWidthNonNumeric                        SMILE       61           1500000    
  explicit           MMAP  FRONT_CODED_16_V1        false  avgt    5  424.357 ± 
 10.445  ms/op
   SqlExpressionBenchmark.querySql                  NONE         
fixedWidthNonNumeric                        SMILE       61           1500000    
  explicit           MMAP  FRONT_CODED_16_V1        force  avgt    5  419.708 ± 
 71.678  ms/op
   SqlExpressionBenchmark.querySql                  NONE                       
always                        SMILE       61           1500000      explicit    
       MMAP  FRONT_CODED_16_V1        false  avgt    5  444.724 ± 112.962  ms/op
   SqlExpressionBenchmark.querySql                  NONE                       
always                        SMILE       61           1500000      explicit    
       MMAP  FRONT_CODED_16_V1        force  avgt    5  373.843 ±   8.409  ms/op
   ```
   
   I also considered adding a `getBitmapsIterator` to 
`DictionaryEncodedValueIndex`, but ultimately decided against it because most 
of the bitmap `get` methods do some coercion of null values to empty bitmaps so 
they can't just use the underlying `Indexed` iterator directly... which sounded 
a bit more tedious than i wanted to deal with. Perhaps can consider doing this 
as a follow-up so the places that are iterating both dictionaries and 
collecting the corresponding bitmaps can just both use iterators instead of 
keeping a counter, or making some convenient structure to iterate both things 
at the same time so we don't even need to keep in sync...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to