amrishlal commented on issue #7437:
URL: https://github.com/apache/pinot/issues/7437#issuecomment-923691071


   The proposal doc is very interesting reading. Thanks for creating it. I am 
still in the process of reading through it, but wanted to add a few points 
regarding the current RangeIndex. A while back, we had done a high-level 
evaluation of the current RangeIndex algorithm and came up with the following 
optimizations:
   - If there are N values in the column being indexed, then 
`RangeIndexCreator` should automatically set the number of buckets to `sqrt(N)` 
as this is proven to be the most optimal size for searching the right bucket 
and then searching within the bucket for the right position.
   - Instead of carrying out linear search, `RangeIndexBasedFilterOperator` 
should carry out a binary search to locate the right bucket.
   - A binary search should also be carried out to locate the position with the 
bucket (not completely sure if this can be done).
   
   We concluded that these three improvements would improve the performance of 
RangeIndex to 1/2 * log2(N) for any given starting or ending value. We didn't 
get around to making these optimizations, but I thought I would put these here 
for completeness and performance comparison against the "Bit Slice Range 
Index". The changes (except for maybe the third one) should be straight-forward 
to implement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to