peferron edited a comment on issue #7187: Improve topN algorithm URL: https://github.com/apache/incubator-druid/issues/7187#issuecomment-480718601 From your description, it sounds like it could handle the "top songs by unique viewers" query we discussed earlier, by processing pairs of {SongID, UserID}. Essentially like a topN with HLL but with accuracy guarantees, and still better performance than an exact query. If that's the case then it sounds extremely useful. I'd love to see how you plan to implement that. It would be especially useful if additional aggregations could be computed at the same time, such as computing the total (non-distinct) count for each key: `SELECT COUNT(DISTINCT UserID), COUNT(*) GROUP BY IPAddress ORDER BY 1 DESC LIMIT 10`. In my experience it's common to compute multiple metrics in the same query in Druid. This looks like something that would be implemented in the Druid extension rather than the sketch itself though, although the sketch may need to expose some APIs to support that. Once you have a draft extension ready I'll be happy to run it against some of our real-world datasets if that helps. I'll even pick some dimensions where topN fails badly for comparison 😄. I won't have time to help developing the extension itself, unfortunately.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
