peferron edited a comment on issue #7187: Improve topN algorithm
URL: 
https://github.com/apache/incubator-druid/issues/7187#issuecomment-480718601
 
 
   From your description, it sounds like it could handle the "top songs by 
unique viewers" query we discussed earlier, by processing pairs of {SongID, 
UserID}. Essentially like a topN with HLL but with accuracy guarantees, and 
still better performance than an exact query. If that's the case then it sounds 
extremely useful. I'd love to see how you plan to implement that.
   
   It would be especially useful if additional aggregations could be computed 
at the same time, such as computing the total (non-distinct) count for each 
key: `SELECT COUNT(DISTINCT UserID), COUNT(*) GROUP BY IPAddress ORDER BY 1 
DESC LIMIT 10`. In my experience it's common to compute multiple metrics in the 
same query in Druid. This looks like something that would be implemented in the 
Druid extension rather than the sketch itself though, although the sketch may 
need to expose some APIs to support that.
   
   Once you have a draft extension ready I'll be happy to run it against some 
of our real-world datasets if that helps. I'll even pick some dimensions where 
topN fails badly for comparison 😄. I won't have time to help developing the 
extension itself, unfortunately.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to