fx19880617 edited a comment on issue #4248: Alternative HyperLogLog implementation with higher accuracy URL: https://github.com/apache/incubator-pinot/issues/4248#issuecomment-496738719 Could you elaborate more on the performance and do you do the performance testing? Below is how Uber uses fasthll: For fasthll, there is a pre-aggregation phase before data ingestion, e.g. we have a job publish all the dimension combinations and it's corresponding unique counting FASTHLL object every 5mins and offline job is daily basis. This leads to data volume reduction to only 10% of the original size. From both storage and computation perspective, fasthll is more efficient IMO.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
