Jackie-Jiang opened a new pull request #8074: URL: https://github.com/apache/pinot/pull/8074
## Description For `DISTINCT_COUNT` and `DISTINCT_COUNT_MV` aggregation function, currently we use `Set` to store all the values, which can cause memory issues and potentially exhaust the memory for Servers or Brokers. This PR adds the support to automatically convert the `Set` to `HyperLogLog` if the set size grows too big to protect the servers. This conversion only applies to aggregation only queries, but not the group-by queries. By default, when the set size exceeds 100K, it will be converted to a HyperLogLog with log2m of 12. The log2m and threshold can be configured using the second argument (literal) of the function: - `hllLog2m`: log2m of the converted HyperLogLog (default 12) - `hllConversionThreshold`: set size threshold to trigger the conversion, non-positive means never convert (default 100K) Example query: `SELECT DISTINCTCOUNT(myCol, 'hllLog2m=8;hllConversionThreshold=10') FROM myTable` ## Release Notes Add second argument (literal) to `DISTINCT_COUNT` and `DISTINCT_COUNT_MV` aggregation function for optional parameters: - `hllLog2m`: log2m of the converted HyperLogLog (default 12) - `hllConversionThreshold`: set size threshold to trigger the conversion, non-positive means never convert (default 100K) For `DISTINCT_COUNT` and `DISTINCT_COUNT_MV` aggregation only queries, if the result is over 100K, the query will use `HyperLogLog` and return approximate result by default. To get back to the 100% accurate behavior, set `hllConversionThreshold` to a non-positive value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
