ianvkoeppe edited a comment on issue #4293: Add support for an aggregation function returning serialized hyperlog… URL: https://github.com/apache/incubator-pinot/pull/4293#issuecomment-501022309 > From the original description, I thought your requirement was something as follows - select discountCountHLL(memberId) from T where pageId in (...... 10,000 IDs) and you wanted to solve this by running multiple (10 in this case) queries of the form select DISTINCTCOUNTRAWHLL(memberId) from T where pageId in (1000 IDs ...) The only thing missing is that we might also have a dimension to group by... `SELECT DISTINCTCOUNTRAWHLL(precomputedHLLColumn) FROM T WHERE pageId in (1000 IDS) GROUP BY JOB_TITLE`. Depending on the characteristics of the users visiting the pages, this could be tens of thousands of job titles. But I also mentioned, we have 50+ metrics in the table, so a request would be more like `SELECT SUM(col1), SUM(col2), ..., DISTINCTCOUNTRAWHLL(...)`. So the combination of a large table with many IDs, and a large dimension (job titles) means the response size could be arbitrarily large unless we make the filter more strict by batching IDs. > Just to clarify, I think this is a valid feature request to have in Pinot and I will review it shortly. Cool, glad to help explain our use case more thoroughly if we can help improve documentation or solutions for future devs. Also, interested if you think an existing solution works for this 1st use case.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
