peferron commented on issue #7187: Improve topN algorithm URL: https://github.com/apache/incubator-druid/issues/7187#issuecomment-481112318 > The more generic we try to make this, the more challenging it will be to configure and the performance will be impacted. Understood. Extra aggregations can still be computed in a follow-up query anyway, such as `SELECT COUNT(*), AVG(SomeColumn) WHERE IPAddress IN ('a', 'b', 'c')` where `a`, `b` and `c` are top IP addresses previously returned by the FUN query. This can work decently well with Druid indexes. One advantage of this approach is that the FUN query returns faster since it's not loaded with extra aggregations, so you can immediately show up the top items & unique counts to the user, with extra metrics following up later. In some cases that's better UX than a longer initial wait followed up by showing up everything at once. That's a bit off-topic but hopefully there's some value in listing the different ways in which this sketch could be used within Druid. > Can we start with just the "top songs by unique users" and characterize that first? Sure. Are you looking for any specific patterns in the test data? I assume that Yahoo already has large datasets & Druid clusters that could be used, so I'm trying to see what I could bring to the table here. > Will you need an actual published artifact Jar to test this. Or would a jar generated from a branch be OK for your testing? The best would be a branch that I can check out to build the extension from source.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
