[GitHub] [pinot] Jackie-Jiang commented on issue #8800: Add a "Distinct" implementation that leverages index for low cardinality columns

GitBox Wed, 01 Jun 2022 10:22:42 -0700


Jackie-Jiang commented on issue #8800:
URL: https://github.com/apache/pinot/issues/8800#issuecomment-1143908580


   I think I get the general idea of using inverted index to solve distinct and 
group-by queries:
   - SELECT DISTINCT colA FROM myTable WHERE ...
   - SELECT COUNT(*) FROM myTable WHERE ... GROUP BY colA
   
   When the colA has inverted index, we can scan all the bitmaps to solve the 
query instead of scanning the matching docs. 
   
   It might be able to accelerate the query when the following conditions are 
met:
   - `colA` has low cardinality (we don't want to scan too many bitmaps)
   - Filter has low selectivity (lots of records matched, so scanning cost is 
relatively high)
   
   Note that when `colA` has low cardinality, the current approach won't be 
very costly. We'll maintain a small set/map on dictionary ids of up to 
cardinality size. Scanning the bitmaps is not strictly O(cardinality) 
complexity because processing each bitmap can be up to O(number of rows). We 
should evaluate and find the break even point for this optimization to 
out-perform the current solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] Jackie-Jiang commented on issue #8800: Add a "Distinct" implementation that leverages index for low cardinality columns

Reply via email to