ianvkoeppe commented on issue #4293: Add support for an aggregation function 
returning serialized hyperlog…
URL: https://github.com/apache/incubator-pinot/pull/4293#issuecomment-501022309
 
 
   > From the original description, I thought your requirement was something as 
follows - select discountCountHLL(memberId) from T where pageId in (...... 
10,000 IDs)
   and you wanted to solve this by running multiple (10 in this case) queries 
of the form
   select DISTINCTCOUNTRAWHLL(memberId) from T where pageId in (1000 IDs ...)
   
   The only thing missing is that we might also have a dimension to group by...
   `SELECT DISTINCTCOUNTRAWHLL(precomputedHLLColumn) FROM T WHERE pageId in 
(1000 IDS) GROUP BY JOB_TITLE`. Depending on the characteristics of the users 
visiting the pages, this could be tens of thousands of job titles. But I also 
mentioned, we have 50+ metrics in the table, so a request would be more like 
`SELECT SUM(col1), SUM(col2), ..., DISTINCTCOUNTRAWHLL(...)`. So the 
combination of a large table with many IDs, and a large dimension (job titles) 
means the response size could be arbitrarily large unless we make the filter 
more strict by batching IDs.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to