rohangarg opened a new pull request, #13133:
URL: https://github.com/apache/druid/pull/13133

   Currently, for checking the in-filter values in a column dictionary we use a 
binary search per value in the set. That works well for smaller value-sets but 
starts slowing down as the number of values in the set increase. To accommodate 
for large value-sets arising from large in-filters or from joins being pushed 
down as in-filters, we use sorted merge algorithm for merging the set and 
dictionary for larger values.
   
   The following benchmark was run to find the cutoff point :
   
   ```
   Benchmark     (dictionarySize)  (filterToDictionaryPercentage)  
(selectivityPercentage)  Mode  Cnt        Score       Error  Units
   binarySearch           1000000                               1               
        10  avgt   10    11271.575 ±   621.620  us/op
   binarySearch           1000000                               1               
       100  avgt   10    16751.127 ±   580.550  us/op
   binarySearch           1000000                               2               
        10  avgt   10    25387.245 ±  1795.582  us/op
   binarySearch           1000000                               2               
       100  avgt   10    19707.334 ±   973.384  us/op
   binarySearch           1000000                               5               
        10  avgt   10    37256.779 ±  2287.203  us/op
   binarySearch           1000000                               5               
       100  avgt   10    47391.511 ±  1689.971  us/op
   binarySearch           1000000                              10               
        10  avgt   10    76204.615 ±  6515.056  us/op
   binarySearch           1000000                              10               
       100  avgt   10    71483.416 ±  7197.376  us/op
   binarySearch           1000000                              12               
        10  avgt   10  139481.091 ± 15513.357  us/op
   binarySearch           1000000                              12               
       100  avgt   10  142881.846 ±  9511.367  us/op
   binarySearch           1000000                              15               
        10  avgt   10   113786.273 ±  4123.158  us/op
   binarySearch           1000000                              15               
       100  avgt   10   165300.278 ± 10555.479  us/op
   binarySearch           1000000                              20               
        10  avgt   10   138410.942 ± 14330.367  us/op
   binarySearch           1000000                              20               
       100  avgt   10   137543.621 ±  9273.845  us/op
   binarySearch           1000000                              30               
        10  avgt   10   206512.608 ± 13497.954  us/op
   binarySearch           1000000                              30               
       100  avgt   10   305317.908 ± 12452.686  us/op
   binarySearch           1000000                              50               
        10  avgt   10   328867.893 ± 14594.249  us/op
   binarySearch           1000000                              50               
       100  avgt   10   332823.291 ± 26349.158  us/op
   binarySearch           1000000                             100               
        10  avgt   10   668720.906 ± 49193.818  us/op
   binarySearch           1000000                             100               
       100  avgt   10  1014546.405 ± 30709.670  us/op
   sortedMerge            1000000                               1               
        10  avgt   10    47220.743 ±  2598.755  us/op
   sortedMerge            1000000                               1               
       100  avgt   10    53634.485 ±  2886.296  us/op
   sortedMerge            1000000                               2               
        10  avgt   10    51201.356 ±  2745.801  us/op
   sortedMerge            1000000                               2               
       100  avgt   10    53046.058 ±  4500.987  us/op
   sortedMerge            1000000                               5               
        10  avgt   10    58501.742 ±  7320.870  us/op
   sortedMerge            1000000                               5               
       100  avgt   10    65597.519 ±  7356.548  us/op
   sortedMerge            1000000                              10               
        10  avgt   10    75347.468 ±  9417.556  us/op
   sortedMerge            1000000                              10               
       100  avgt   10    74601.584 ±  5251.078  us/op
   sortedMerge            1000000                              12               
        10  avgt   10   64838.734 ±  2644.738  us/op
   sortedMerge            1000000                              12               
       100  avgt   10   80342.737 ±  5414.306  us/op
   sortedMerge            1000000                              15               
        10  avgt   10    83345.836 ±  6034.488  us/op
   sortedMerge            1000000                              15               
       100  avgt   10    78405.299 ±  1651.375  us/op
   sortedMerge            1000000                              20               
        10  avgt   10   111307.577 ± 10924.456  us/op
   sortedMerge            1000000                              20               
       100  avgt   10    89371.173 ± 10347.814  us/op
   sortedMerge            1000000                              30               
        10  avgt   10   116740.355 ±  6003.945  us/op
   sortedMerge            1000000                              30               
       100  avgt   10   117312.763 ±  6325.076  us/op
   sortedMerge            1000000                              50               
        10  avgt   10   132145.411 ± 23121.259  us/op
   sortedMerge            1000000                              50               
       100  avgt   10   192802.722 ± 39032.876  us/op
   sortedMerge            1000000                             100               
        10  avgt   10   216079.634 ± 28725.333  us/op
   sortedMerge            1000000                             100               
       100  avgt   10   236561.476 ± 14369.673  us/op
   ```
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to