peferron edited a comment on issue #7187: Improve topN algorithm URL: https://github.com/apache/incubator-druid/issues/7187#issuecomment-477396898 @leerho I think we're on the same page regarding the behavior of the current topN. Our disagreement comes from what we consider "meaningful", which is more subjective. For you, "meaningful" means returning useful and correct results regardless of the data being queried. For me, it's OK if correctness depends on the data; for example, I consider binary search to be useful and capable of returning meaningful results, even though it returns garbage on unsorted data. You probably also consider binary search to be useful. So the main difference is, I think, that it's usually simple to ensure that data fed to binary search is sorted, but much harder to ensure that data fed to topN is suitable along all combinations of aggregations/filters/intervals. That's the main practical problem with topN IMO. But the difficulty here depends on the size of the query space, which is application-specific, so I don't think there can be a single yes/no answer regarding whether one should use topN or not. It's interesting that you mention running brute force queries to verify topN correctness. It's actually quite viable, because the topN query quickly returns results to the user while the brute force (groupBy) query runs in the background using spare cluster resources. So the user experience is improved, even though the total computation time and resources are indeed larger than running only the brute force query.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
