tisonkun opened a new issue, #10: URL: https://github.com/apache/datasketches-rust/issues/10
Yeah. I think it's technically reasonable to have snapshots for tests. The key point here is how these snapshots get generated, can we reproducibly make them? And how we can modify/update them. In other words, how the snapshots in datasketches-go gets generated originally? _Originally posted by @tisonkun in https://github.com/apache/datasketches-rust/pull/1#discussion_r2616336904_ --- @freakyzoidberg told me there are some "tests" which generate those files in the go repository (https://github.com/apache/datasketches-go/blob/f7bc4b1db865c2dd1be9134d8a61eeb8bc24b1c6/hll/hll_sketch_serialization_test.go#L29) I assumed it's similar for the other implementations. _Originally posted by @notfilippo in https://github.com/apache/datasketches-rust/pull/1#discussion_r2616338736_ --- Great! Then at least we can reuse the Go logic to generate Go snapshot. But I can see that it would require extra engineer effort so I won't block this PR by such potential improvement to avoid (mysterious) binaries as much as possible. For the Java and C++ snapshot, perhaps @leerho and @AlexanderSaydakov can give some inputs here. _Originally posted by @tisonkun in https://github.com/apache/datasketches-rust/pull/1#discussion_r2616343650_ --- CPP / Java and Go repo do have some test that generate and cross-test the synopsis from the other repos. It's vey much convention and quite manual - and as Lee hinted in the other thread we didn't really think about how to scale this with more language (very much M*N issue) you can find the Java HLL x-check [here](https://github.com/apache/datasketches-java/blob/main/src/test/java/org/apache/datasketches/hll/HllSketchCrossLanguageTest.java) and the cpp ones for [ser](https://github.com/apache/datasketches-cpp/blob/master/hll/test/hll_sketch_serialize_for_java.cpp)/[de](https://github.com/apache/datasketches-cpp/blob/master/hll/test/hll_sketch_deserialize_from_java_test.cpp) there Also worth noting that not all synopsis are guaranteed to have byte for byte equality (they'll behave the same, logically equivalent from a behavior aspect and are fully serializable/deserializable between language - but not all provide guarantee of idempotency generation when looking at raw bytes - tldr not all rng are seeded - they could I suppose though) _Originally posted by @freakyzoidberg in https://github.com/apache/datasketches-rust/pull/1#discussion_r2616369040_ --- > very much M*N issue Not quite. Each language can implement its own serialized snapshots, and any language should leverage the existing snapshots while patching its own. We can have a shared snapshot library like shared proto definitions in other projects. Anyway, this is another topic, so I'll open a new issue to track it. _Originally posted by @tisonkun in https://github.com/apache/datasketches-rust/pull/1#discussion_r2616639771_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
