leerho commented on code in PR #676:
URL: https://github.com/apache/datasketches-java/pull/676#discussion_r2229256095


##########
src/main/java/org/apache/datasketches/count/CountMinSketch.java:
##########
@@ -22,8 +22,8 @@
 import org.apache.datasketches.common.Family;
 import org.apache.datasketches.common.SketchesArgumentException;
 import org.apache.datasketches.common.SketchesException;
+import org.apache.datasketches.common.Util;
 import org.apache.datasketches.hash.MurmurHash3;
-import org.apache.datasketches.tuple.Util;
 
 import java.io.ByteArrayOutputStream;
 import java.nio.ByteBuffer;

Review Comment:
   I do not recommend the use of ByteBuffer or any of its subclasses.  Since 
the main branch is targeted for Java 25, I recommend switching to use FFM 
MemorySegments. Here are my reasons:
   - ByteBuffer off-heap allocations cannot be interactively (dynamically) 
closed, they are managed by the GC.  Thus out-of-memory scenarios can happen in 
large systems with millions of sketches because the GC can't keep up with new 
off-heap allocations.
   - The default endianness of BB is Big-Endian!  and even if you intentionally 
set it to Little-Endian, it will silently switch back to BE after duplicate(), 
clear(), or slice(), operations.  To make matters worse, this silent behavior 
of BB is not documented in the official Javadocs!  
   - Managing endianness with BB requires a great deal of care and it is very 
easy to forget to constantly switch the endianness back to LE.  Thus, I 
consider its use here in this class as dangerous since there is no explicit 
settings of endianness, (whether to LE or BE!).  
   - Endianness is explicitly set with FFM (and with the older 
datasketches.memory classes) with no silent switching of endianness, thus much 
safer.
   - Our library is completely LE with one exception in t-Digest for binary 
compatibility with externally produced t-Digest sketches.  Since the 
introduction of CountMin is new for the Java library it should use LE ordering, 
since, I assume, there is no history of CountMin sketches being stored.
   - Our Theta, Tuple and CPC sketches must be LE due to their compression 
algorithms, which rely on LE byte ordering.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@datasketches.apache.org
For additional commands, e-mail: dev-h...@datasketches.apache.org

Reply via email to