leerho commented on issue #693: URL: https://github.com/apache/datasketches-java/issues/693#issuecomment-3583717964
@thomasrebele, What both @jmalkin and @AlexanderSaydakov are trying to explain to you is that many sketch algorithms, including KLL, are probabilistic in nature. This means that any result you get is a random draw from a known probability distribution. This is true even if the sketch is fed the exact same input data and in the same order. From the mathematics of the proofs, we can determine what the confidence interval is that bounds the result you get. The accuracy "guarantee" of these sketches is just that -- that the result will be within the confidence interval (bounds) with the stated confidence, which is determined by your choice of "k" that configures the sketch. This is by design. If you force the sketch to be deterministic by fixing the seed of the hash function you destroy the probabilistic guarantee and the results of the sketch will be certainly biased by an unknown amount and in an unknown directlon. So it makes no sense to provide an alternate method that "provides a Random number generator with a predefined seed". What needs to change is not the behavior of the sketch, but your expectation that the results from a probabilistic algorithm will always be exactly the same. It won't be, and it can't be. It will be close, but not exactly the same. Change your code so that if the result is within +/- some percent of some previous value, it is acceptable. Choose a percentage threshold that is 3 or 4X the epsilon of the sketch, given K. This will decrease the probability of failure to almost zero. Cheers, Lee. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
