leerho commented on code in PR #676: URL: https://github.com/apache/datasketches-java/pull/676#discussion_r2229256095
########## src/main/java/org/apache/datasketches/count/CountMinSketch.java: ########## @@ -22,8 +22,8 @@ import org.apache.datasketches.common.Family; import org.apache.datasketches.common.SketchesArgumentException; import org.apache.datasketches.common.SketchesException; +import org.apache.datasketches.common.Util; import org.apache.datasketches.hash.MurmurHash3; -import org.apache.datasketches.tuple.Util; import java.io.ByteArrayOutputStream; import java.nio.ByteBuffer; Review Comment: I do not recommend the use of ByteBuffer or any of its subclasses. Since the main branch is targeted for Java 25, I recommend switching to use FFM MemorySegments. Here are my reasons: - ByteBuffer off-heap allocations cannot be interactively (dynamically) closed, they are managed by the GC. Thus out-of-memory scenarios can happen in large systems with millions of sketches because the GC can't keep up with new off-heap allocations. - The default endianness of BB is Big-Endian! and even if you intentionally set it to Little-Endian, it will silently switch back to BE after duplicate(), clear(), or slice(), operations. To make matters worse, this silent behavior of BB is not documented in the official Javadocs! - Managing endianness with BB requires a great deal of care and it is very easy to forget to constantly switch the endianness back to LE. Thus, I consider its use here in this class as dangerous since there is no explicit settings of endianness, (whether to LE or BE!). - Endianness is explicitly set with FFM (and with the older datasketches.memory classes) with no silent switching of endianness, thus much safer. - Our library is completely LE with one exception in t-Digest for binary compatibility with externally produced t-Digest sketches. Since the introduction of CountMin is new for the Java library it should use LE ordering, since, I assume, there is no history of CountMin sketches being stored. - Our Theta, Tuple and CPC sketches must be LE due to their compression algorithms, which rely on LE byte ordering. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@datasketches.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@datasketches.apache.org For additional commands, e-mail: dev-h...@datasketches.apache.org