freakyzoidberg opened a new pull request, #679: URL: https://github.com/apache/datasketches-java/pull/679
This pull request introduces a new test class, `HllSketchMergeOrderTest`, to investigate and demonstrate the order dependency of DataSketch HLL merge operations. The test highlights that merging HLL sketches in different orders can produce varying cardinality estimates, which has significant implications for applications relying on consistent results. ### Key Additions: #### New Test for Merge Order Dependency: * Added `HllSketchMergeOrderTest` class to demonstrate that merging HLL sketches in different orders (e.g., ABC, CBA, BAC) can lead to different cardinality estimates, proving that the operations are not always commutative or associative. This is especially evident with specific data patterns like powers of 2. #### Supporting Methods: * Implemented `createPowersOf2Sketch` method to generate HLL sketches with powers-of-2 values, which are prone to triggering order dependency. * Added `mergeThreeSketches` method to merge three sketches in a specified order and return the cardinality estimate. #### Documentation and Findings: * Included detailed comments and documentation within the test class to explain the findings, key implications, and the mathematical expectations violated by the observed behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@datasketches.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@datasketches.apache.org For additional commands, e-mail: dev-h...@datasketches.apache.org