gianm commented on issue #13987: URL: https://github.com/apache/druid/issues/13987#issuecomment-1964800158
In general it isn't an expectation that the HLL operations produce exactly the same estimate every time. I did ask the datasketches folks about this once, you can see the response if you have an account on ASF Slack: https://the-asf.slack.com/archives/CP0930GKG/p1682452090261029. TLDR is that repeatable order-insensitive results are not something that the datasketches team is going for with most of their sketches. So in Druid land we inherit this. A couple of relevant comments in the discussion, from a datasketches developer: > Sketches, by their design are approximate and/or probabilistic, and in general, the results from a sketch should be viewed as such. This means that the user should not expect either exact results, because sketches are approximate, nor expect exact repeatability of results even with identical inputs, because sketches should be treated as probabilistic. & > As soon as the user starts demanding exactness in terms of either accuracy or repeatability of results, it requires specific details of the sketch algorithm, and may not be achievable without compromising other properties of the sketch as Jon mentions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
