thomasrebele commented on issue #693:
URL: 
https://github.com/apache/datasketches-java/issues/693#issuecomment-3597778915

   Thank you @AlexanderSaydakov and @leerho for the feedback. I understand that 
KLL is probabilistic in nature. Probabilistic updates (feeding the data to a 
single KLL sketch) are also not my issue. I'm happy with the non-determinism 
there. However, I still need to get a deterministic result for my use case: 
merging `n` KLL sketches to a single one.
   
   I agree that miss-using the random number generator (RNG) might lead to bad 
sketches: if the same seed is used for every `KllSketch#merge(KllSketch, 
Random)` call, then the errors add up and become quite large. However, there's 
a way around this: When the RNG is initialized with a fixed seed at the 
beginning and re-used for the merge operations, then the error seems to be the 
same as the original method that uses the RNG KllSketch#random.
   
   I've prepared some experiments based on the proposed PR. It seems to me that 
the deterministic merge is still good enough to be used in my use case, as the 
errors are very similar to the errors of the original merge method. Please see 
https://github.com/thomasrebele/datasketches-java/commit/8abbddfea4e2dddc6849e6fe7c44dcd83dae17a5
 for the results and the code. I'm not sure whether I measured the normalized 
rank error correctly, as I could not find the exact definition. I'm happy to 
adapt the code if you point me to its definition.
   
   Would you have a look at my experiment, please? I'm happy to extend it if 
you think more experiments are necessary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to