leerho commented on issue #7187: Improve topN algorithm URL: https://github.com/apache/incubator-druid/issues/7187#issuecomment-472623400 Just to be clear and independent of FIS, I'm not advocating removing TopN. What is very problematic in the current implementation is allowing aggregation functions after limiting / truncation of the data. This can lead to wildly wrong results such that it is misleading to even call it "TopN" anymore. This is very easy to prove, by the way, that you could be entirely missing the Top-1 and with a little extra math, prove that it is not too hard to miss all of the TopN. You can still have TopN and you can still provide the aggregation functions as long as they are all performed prior to any limiting/truncation. If you follow this, It changes the `max(k, 1000)` step to just `k`, which means you will be sending far less data to the broker. And now when the broker applies a` PriorityQueue(k)`, you will have a TopN that is no longer data sensitive and quite robust. Although it will likely be slower. I realize that speed is all important to Druid and that is why I love Druid as I am a speed freak too :) But allowing functionality that effectively corrupts the intent of the query is not a good idea as it can come back to bite no matter how much you caveat it in the documentation. "If it doesn't have to work, it can meet any requirement." "If you don't care about quality, you can achieve any objective." -- G.M. Weinberg Yes, perhaps the FIS should be a completely separate function, that is up to you. But it is the only mechanism that will allow a simple form of aggregation and "truncation" at the same time, is fast, data insensitive, and with known error.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
