peferron edited a comment on issue #7187: Improve topN algorithm
URL: 
https://github.com/apache/incubator-druid/issues/7187#issuecomment-477396898
 
 
   @leerho I think we're on the same page regarding the behavior of the current 
topN. Our disagreement comes from what we consider "meaningful", which is more 
subjective.
   
   For you, "meaningful" means returning useful and correct results regardless 
of the data being queried.
   
   For me, it's OK if correctness depends on the data; for example, I consider 
binary search to be useful and capable of returning meaningful results, even 
though it returns garbage on unsorted data.
   
   You probably also consider binary search to be useful. So the main 
difference is, I think, that it's usually simple to ensure that data fed to 
binary search is sorted, but much harder to ensure that data fed to topN is 
suitable along all combinations of aggregations/filters/intervals. That's the 
main practical problem with topN IMO. But the difficulty here depends on the 
size of the query space, which is application-specific, so I don't think there 
can be a single yes/no answer regarding whether one should use topN or not.
   
   It's interesting that you mention running brute force queries to verify topN 
correctness. It's actually quite viable, because the topN query quickly returns 
results to the user while the brute force (groupBy) query runs in the 
background using spare cluster resources. So the user experience is improved, 
even though the total computation time and resources are indeed larger than 
running only the brute force query.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to