Re: [I] Non-deterministic result when merging an empty KllFloatsSketch with two others (datasketches-java)

via GitHub Mon, 01 Dec 2025 09:07:02 -0800


thomasrebele commented on issue #693:
URL: 
https://github.com/apache/datasketches-java/issues/693#issuecomment-3597778915

Thank you @AlexanderSaydakov and @leerho for the feedback. I understand that
KLL is probabilistic in nature. Probabilistic updates (feeding the data to a
single KLL sketch) are also not my issue. I'm happy with the non-determinism
there. However, I still need to get a deterministic result for my use case:
merging `n` KLL sketches to a single one.

I agree that miss-using the random number generator (RNG) might lead to bad
sketches: if the same seed is used for every `KllSketch#merge(KllSketch,
Random)` call, then the errors add up and become quite large. However, there's
a way around this: When the RNG is initialized with a fixed seed at the
beginning and re-used for the merge operations, then the error seems to be the
same as the original method that uses the RNG KllSketch#random.

I've prepared some experiments based on the proposed PR. It seems to me that
the deterministic merge is still good enough to be used in my use case, as the
errors are very similar to the errors of the original merge method. Please see
https://github.com/thomasrebele/datasketches-java/commit/8abbddfea4e2dddc6849e6fe7c44dcd83dae17a5
for the results and the code. I'm not sure whether I measured the normalized
rank error correctly, as I could not find the exact definition. I'm happy to
adapt the code if you point me to its definition.

Would you have a look at my experiment, please? I'm happy to extend it if
you think more experiments are necessary.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Non-deterministic result when merging an empty KllFloatsSketch with two others (datasketches-java)

Reply via email to