Except for the static vs dynamic storage, I don't think the proposal I have above is too different from yours. At least the capabilities each language needs to provide are roughly the same. Perhaps I have thought of a few things you haven't listed and visa versa.
Detecting bitwise equality is not going to work as I explain above. On Sun, Jan 11, 2026 at 2:05 PM Lee Rhodes <[email protected]> wrote: > See my missive: https://github.com/apache/datasketches-rust/issues/10 > > On Thu, Jan 8, 2026 at 7:24 PM tison <[email protected]> wrote: > >> > Is your plan to just use this repo for .sk files? I.e., just data. >> >> Nope. As you can see in [1], the snapshots will be checked in, the >> (generator) script would be checked in as well. >> >> [1] https://github.com/tisonkun/datasketches-testsuite >> >> > Is your plan to choose one language to generate the .sk files? Then >> what language? That means at least one language would have to implement >> all sketches. Right now that would be either Java or C++. >> >> Again, as shown in [1], I tend to collect snapshots in all languages, >> so that they can be checked mutually: >> >> serialization_test_data/cpp_generated_files >> serialization_test_data/java_generated_files >> (further) serialization_test_data/go_generated_files >> (further) serialization_test_data/rust_generated_files >> >> Each language impl should have their own generator logics. So far, I >> glue those logics with a python script [2]. In rust, we can provide a >> binary target "gensnaps" to generate snapshots. >> >> [2] >> https://github.com/tisonkun/datasketches-testsuite/blob/main/gensnaps.py >> >> That is, instead of having one impl for all sketches, in >> datasketches-testsuite, each impl provides the sketches snapshots they >> implement for other impl to test over. >> >> > What do you mean by "Unstable snapshots"? How are you measuring >> "unstable"? >> >> See [3]. Once I checked in those snapshots, to ensure the snapshot >> generator are correct, I tend to regenerate it in CI and check the >> snapshots are bit-to-bit identical. But it seems the snapshots listed >> above (bf, kll, etc.) would change every time. Not sure it's the >> sketches' feature or we use some random value in the generation >> logics. >> >> [3] >> https://github.com/tisonkun/datasketches-testsuite/actions/runs/20801578720/job/59747439700 >> >> Best, >> tison. >> >> Lee Rhodes <[email protected]> 于2026年1月9日周五 10:15写道: >> > >> > Some thoughts and questions for clarification of your strategy: >> > >> > Is your plan to just use this repo for .sk files? I.e., just data. >> > With shared code, we would have to partition sections of the repo for >> different languages. >> > Nonetheless, when we have more than one repo sharing the same language >> (~4 for Java now), there is an opportunity to have a place here for shared >> run-time code (e.g., Hash functions, common math functions, common >> bit-twiddling code, etc). If we ever want to do this we might want a name >> that is more neutral than "testsuite" >> > >> > Is your plan to choose one language to generate the .sk files? Then >> what language? That means at least one language would have to implement >> all sketches. Right now that would be either Java or C++. >> > >> > What do you mean by "Unstable snapshots"? How are you measuring >> "unstable"? >> > >> > >> > >> > >> > >> > On Wed, Jan 7, 2026 at 5:02 PM tison <[email protected]> wrote: >> >> >> >> Looks like some sketches are unstable for generating. Not sure if we >> >> should make them stable; or, if that's impossible, ignore the diff and >> >> optionally check they are still logically compatible. >> >> >> >> The unstable snapshots are: >> >> >> >> * cpp_generated_files/bf_n0_h3_cpp.sk >> >> * cpp_generated_files/bf_n0_h5_cpp.sk >> >> * cpp_generated_files/bf_n10000_h3_cpp.sk >> >> * cpp_generated_files/bf_n10000_h5_cpp.sk >> >> * cpp_generated_files/bf_n2000000_h3_cpp.sk >> >> * cpp_generated_files/bf_n2000000_h5_cpp.sk >> >> * cpp_generated_files/bf_n30000000_h3_cpp.sk >> >> * cpp_generated_files/bf_n30000000_h5_cpp.sk >> >> * cpp_generated_files/kll_double_n1000000_cpp.sk >> >> * cpp_generated_files/kll_double_n100000_cpp.sk >> >> * cpp_generated_files/kll_double_n10000_cpp.sk >> >> * cpp_generated_files/kll_double_n1000_cpp.sk >> >> * cpp_generated_files/kll_float_n1000000_cpp.sk >> >> * cpp_generated_files/kll_float_n100000_cpp.sk >> >> * cpp_generated_files/kll_float_n10000_cpp.sk >> >> * cpp_generated_files/kll_float_n1000_cpp.sk >> >> * cpp_generated_files/kll_string_n1000000_cpp.sk >> >> * cpp_generated_files/kll_string_n100000_cpp.sk >> >> * cpp_generated_files/kll_string_n10000_cpp.sk >> >> * cpp_generated_files/kll_string_n1000_cpp.sk >> >> * cpp_generated_files/quantiles_double_n1000000_cpp.sk >> >> * cpp_generated_files/quantiles_double_n100000_cpp.sk >> >> * cpp_generated_files/quantiles_double_n10000_cpp.sk >> >> * cpp_generated_files/quantiles_double_n1000_cpp.sk >> >> * cpp_generated_files/quantiles_string_n1000000_cpp.sk >> >> * cpp_generated_files/quantiles_string_n100000_cpp.sk >> >> * cpp_generated_files/quantiles_string_n10000_cpp.sk >> >> * cpp_generated_files/quantiles_string_n1000_cpp.sk >> >> * cpp_generated_files/req_float_n1000000_cpp.sk >> >> * cpp_generated_files/req_float_n100000_cpp.sk >> >> * cpp_generated_files/req_float_n10000_cpp.sk >> >> * cpp_generated_files/req_float_n1000_cpp.sk >> >> * cpp_generated_files/varopt_sketch_long_n1000000_cpp.sk >> >> * cpp_generated_files/varopt_sketch_long_n100000_cpp.sk >> >> * cpp_generated_files/varopt_sketch_long_n10000_cpp.sk >> >> * cpp_generated_files/varopt_sketch_long_n1000_cpp.sk >> >> * cpp_generated_files/varopt_sketch_long_n100_cpp.sk >> >> * cpp_generated_files/varopt_sketch_long_sampling_cpp.sk >> >> * cpp_generated_files/varopt_union_double_sampling_cpp.sk >> >> >> >> * java_generated_files/bf_n0_h3_java.sk >> >> * java_generated_files/bf_n0_h5_java.sk >> >> * java_generated_files/bf_n10000_h3_java.sk >> >> * java_generated_files/bf_n10000_h5_java.sk >> >> * java_generated_files/bf_n2000000_h3_java.sk >> >> * java_generated_files/bf_n2000000_h5_java.sk >> >> * java_generated_files/bf_n30000000_h3_java.sk >> >> * java_generated_files/bf_n30000000_h5_java.sk >> >> * java_generated_files/kll_double_n1000000_java.sk >> >> * java_generated_files/kll_double_n100000_java.sk >> >> * java_generated_files/kll_double_n10000_java.sk >> >> * java_generated_files/kll_double_n1000_java.sk >> >> * java_generated_files/kll_float_n1000000_java.sk >> >> * java_generated_files/kll_float_n100000_java.sk >> >> * java_generated_files/kll_float_n10000_java.sk >> >> * java_generated_files/kll_float_n1000_java.sk >> >> * java_generated_files/kll_long_n1000000_java.sk >> >> * java_generated_files/kll_long_n100000_java.sk >> >> * java_generated_files/kll_long_n10000_java.sk >> >> * java_generated_files/kll_long_n1000_java.sk >> >> * java_generated_files/kll_string_n1000000_java.sk >> >> * java_generated_files/kll_string_n100000_java.sk >> >> * java_generated_files/kll_string_n10000_java.sk >> >> * java_generated_files/kll_string_n1000_java.sk >> >> * java_generated_files/quantiles_double_n1000000_java.sk >> >> * java_generated_files/quantiles_double_n100000_java.sk >> >> * java_generated_files/quantiles_double_n10000_java.sk >> >> * java_generated_files/quantiles_double_n1000_java.sk >> >> * java_generated_files/quantiles_string_n1000000_java.sk >> >> * java_generated_files/quantiles_string_n100000_java.sk >> >> * java_generated_files/quantiles_string_n10000_java.sk >> >> * java_generated_files/quantiles_string_n1000_java.sk >> >> * java_generated_files/req_float_n1000000_java.sk >> >> * java_generated_files/req_float_n100000_java.sk >> >> * java_generated_files/req_float_n10000_java.sk >> >> * java_generated_files/req_float_n1000_java.sk >> >> * java_generated_files/varopt_sketch_long_n1000000_java.sk >> >> * java_generated_files/varopt_sketch_long_n100000_java.sk >> >> * java_generated_files/varopt_sketch_long_n10000_java.sk >> >> * java_generated_files/varopt_sketch_long_n1000_java.sk >> >> * java_generated_files/varopt_sketch_long_n100_java.sk >> >> * java_generated_files/varopt_sketch_long_sampling_java.sk >> >> * java_generated_files/varopt_union_double_sampling_java.sk >> >> >> >> Best, >> >> tison. >> >> >> >> tison <[email protected]> 于2026年1月8日周四 08:53写道: >> >> > >> >> > Hi, >> >> > >> >> > Following up on the discussion [1], I'd like to seek consensus to >> >> > rename our existing but unused repo >> >> > https://github.com/apache/datasketches-java-common to >> >> > datasketches-testsuite to hold shared snapshot generator and >> >> > (optional) serde tests. >> >> > >> >> > [1] https://github.com/apache/datasketches-rust/issues/10 >> >> > >> >> > Here is a repo link that would replace the current content [2]. It >> contains: >> >> > >> >> > a. A script (gensnaps.py) to generate sketch snapshots for some >> language impls. >> >> > b. Checked-in snapshots that can guard the generator behavior, and >> for >> >> > language serde tests to easily download the snaps instead of >> >> > generating in place. >> >> > c. (Optionally) Run some basic snap tests. Not included yet. >> >> > >> >> > [2] https://github.com/tisonkun/datasketches-testsuite >> >> > >> >> > If we can reach a consensus, I'll open an INFRA ticket to ask the >> >> > INFRA team to do the rename. >> >> > >> >> > What do you think? >> >> > >> >> > Best, >> >> > tison. >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [email protected] >> >> For additional commands, e-mail: [email protected] >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >>
