peferron commented on issue #7187: Improve topN algorithm
URL: 
https://github.com/apache/incubator-druid/issues/7187#issuecomment-481112318
 
 
   > The more generic we try to make this, the more challenging it will be to 
configure and the performance will be impacted.
   
   Understood. Extra aggregations can still be computed in a follow-up query 
anyway, such as `SELECT COUNT(*), AVG(SomeColumn) WHERE IPAddress IN ('a', 'b', 
'c')` where `a`, `b` and `c` are top IP addresses previously returned by the 
FUN query. This can work decently well with Druid indexes. One advantage of 
this approach is that the FUN query returns faster since it's not loaded with 
extra aggregations, so you can immediately show up the top items & unique 
counts to the user, with extra metrics following up later. In some cases that's 
better UX than a longer initial wait followed up by showing up everything at 
once. That's a bit off-topic but hopefully there's some value in listing the 
different ways in which this sketch could be used within Druid.
   
   > Can we start with just the "top songs by unique users" and characterize 
that first?
   
   Sure. Are you looking for any specific patterns in the test data? I assume 
that Yahoo already has large datasets & Druid clusters that could be used, so 
I'm trying to see what I could bring to the table here.
   
   > Will you need an actual published artifact Jar to test this.  Or would a 
jar generated from a branch be OK for your testing?
   
   The best would be a branch that I can check out to build the extension from 
source.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to