leerho commented on issue #693:
URL: 
https://github.com/apache/datasketches-java/issues/693#issuecomment-3583717964

   @thomasrebele,
   What both @jmalkin and @AlexanderSaydakov are trying to explain to you is 
that many sketch algorithms, including KLL, are probabilistic in nature. This 
means that any result you get is a random draw from a known probability 
distribution.  This is true even if the sketch is fed the exact same input data 
and in the same order.  From the mathematics of the proofs, we can determine 
what the confidence interval is that bounds the result you get.   The accuracy 
"guarantee" of these sketches is just that -- that the result will be within 
the confidence interval (bounds) with the stated confidence, which is 
determined by your choice of "k" that configures the sketch.  This is by design.
   
   If you force the sketch to be deterministic by fixing the seed of the hash 
function you destroy the probabilistic guarantee and the results of the sketch 
will be certainly biased by an unknown amount and in an unknown directlon.  So 
it makes no sense to provide an alternate method that "provides a Random number 
generator with a predefined seed".
   
   What needs to change is not the behavior of the sketch, but your expectation 
that the results from a probabilistic algorithm will always be exactly the 
same.  It won't be, and it can't be.  It will be close, but not exactly the 
same.  Change your code so that if the result is within +/- some percent of 
some previous value, it is acceptable.  Choose a percentage threshold that is 3 
or 4X the epsilon of the sketch, given K.  This will decrease the probability 
of failure to almost zero. 
   
   Cheers,
   Lee.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to