ianvkoeppe commented on issue #4293: Add support for an aggregation function returning serialized hyperlog… URL: https://github.com/apache/incubator-pinot/pull/4293#issuecomment-501022309 > From the original description, I thought your requirement was something as follows - select discountCountHLL(memberId) from T where pageId in (...... 10,000 IDs) and you wanted to solve this by running multiple (10 in this case) queries of the form select DISTINCTCOUNTRAWHLL(memberId) from T where pageId in (1000 IDs ...) The only thing missing is that we might also have a dimension to group by... `SELECT DISTINCTCOUNTRAWHLL(precomputedHLLColumn) FROM T WHERE pageId in (1000 IDS) GROUP BY JOB_TITLE`. Depending on the characteristics of the users visiting the pages, this could be tens of thousands of job titles. But I also mentioned, we have 50+ metrics in the table, so a request would be more like `SELECT SUM(col1), SUM(col2), ..., DISTINCTCOUNTRAWHLL(...)`. So the combination of a large table with many IDs, and a large dimension (job titles) means the response size could be arbitrarily large unless we make the filter more strict by batching IDs.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
