leerho commented on PR #476:
URL: https://github.com/apache/datasketches-cpp/pull/476#issuecomment-3898822733

   Thank you.  This helps a lot.  I appreciate your concern that we may need 
some means of validating C++ strings for cross-language integrity.  
   
   Is this just a C++ issue? Are there similar issues in Rust, Go, Python?  And 
Java recently introduced [zero-terminiated 
strings](https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/lang/foreign/MemorySegment.html#setString(long,java.lang.String,java.nio.charset.Charset))
 for interoperability with other languages(e.g., C).  
   
   I think there may be other alternatives other than ICU, including 
[UTF8-CPP](https://github.com/nemtrif/utfcpp) and 
[simjson](https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/).
  And from what I've read, validation, if done correctly can be done very fast, 
with only a small section of code.
   
   This concern would apply to any of our sketches that SerDe strings and not 
just Tuple.  This includes, for example, our KLL, REQ, Classic Quantiles,  
Sampling,  FDT,  and FrequentItems sketches.  
   
   So if we decide to introduce a validation function, we should make it 
available to all of the above classes and not just Tuple.  
   
   This is an important topic, and I'd like to get some more comments from 
others, as I do not consider myself an expert.
   Perhaps this should be posted on dev@ to get more feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to