thomasrebele commented on issue #693: URL: https://github.com/apache/datasketches-java/issues/693#issuecomment-3710105210
@jmalkin, what do you mean with "Using a fixed seed across all sketches in a run"? Something like [line 228 to 236 of ExperimentDeterministicMerge](https://github.com/thomasrebele/datasketches-java/commit/e540a85bd9021c3faeb3118bc1437311bd7a8671#diff-f15e17269b86ec14ccbe6dbc1aec506b9cd4fff5bf2be5f9cdda351447cecc92R228-R236)? That is indeed problematic, as the experiment shows. The [new merge method](https://github.com/apache/datasketches-java/commit/6caa284a0ab01fee000ee0d42dd0d919ee387aed#diff-dacbef77551b8e94e3095811486b859725948654627b5141625c453f0ecad774) mentions that the error bounds might be broken. The new merge method could be renamed to, e.g., `unsafeMerge`, so that the caller is aware of the problem. Surely using the API wrongly may lead to incorrect results. However, in the use case that I'm trying to support, there is one run where n KLL sketches are merged, and AFAIK the resulting sketch is never merged with another sketch afterwards. So the code follows the intended use of the algorithm. Unfortunately, with the current API of the datasketches library, this is not possible. I've added some other [experiments](https://github.com/thomasrebele/datasketches-java/commit/e540a85bd9021c3faeb3118bc1437311bd7a8671): I mocked the RNG so that it generates the sequence 0,1,0,1,... (or its companion 1,0,1,0,...). The errors are quite similar to the original KLL errors (entries named `alternating` in https://github.com/thomasrebele/datasketches-java/commit/e540a85bd9021c3faeb3118bc1437311bd7a8671). (Interestingly, this is not the case with a sequence, e.g., 0,1,0,0,1,0,0,1,0,...). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
