Jackie-Jiang commented on issue #8800: URL: https://github.com/apache/pinot/issues/8800#issuecomment-1143908580
I think I get the general idea of using inverted index to solve distinct and group-by queries: - SELECT DISTINCT colA FROM myTable WHERE ... - SELECT COUNT(*) FROM myTable WHERE ... GROUP BY colA When the colA has inverted index, we can scan all the bitmaps to solve the query instead of scanning the matching docs. It might be able to accelerate the query when the following conditions are met: - `colA` has low cardinality (we don't want to scan too many bitmaps) - Filter has low selectivity (lots of records matched, so scanning cost is relatively high) Note that when `colA` has low cardinality, the current approach won't be very costly. We'll maintain a small set/map on dictionary ids of up to cardinality size. Scanning the bitmaps is not strictly O(cardinality) complexity because processing each bitmap can be up to O(number of rows). We should evaluate and find the break even point for this optimization to out-perform the current solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
