See my missive: https://github.com/apache/datasketches-rust/issues/10
On Thu, Jan 8, 2026 at 7:24 PM tison <[email protected]> wrote: > > Is your plan to just use this repo for .sk files? I.e., just data. > > Nope. As you can see in [1], the snapshots will be checked in, the > (generator) script would be checked in as well. > > [1] https://github.com/tisonkun/datasketches-testsuite > > > Is your plan to choose one language to generate the .sk files? Then > what language? That means at least one language would have to implement > all sketches. Right now that would be either Java or C++. > > Again, as shown in [1], I tend to collect snapshots in all languages, > so that they can be checked mutually: > > serialization_test_data/cpp_generated_files > serialization_test_data/java_generated_files > (further) serialization_test_data/go_generated_files > (further) serialization_test_data/rust_generated_files > > Each language impl should have their own generator logics. So far, I > glue those logics with a python script [2]. In rust, we can provide a > binary target "gensnaps" to generate snapshots. > > [2] > https://github.com/tisonkun/datasketches-testsuite/blob/main/gensnaps.py > > That is, instead of having one impl for all sketches, in > datasketches-testsuite, each impl provides the sketches snapshots they > implement for other impl to test over. > > > What do you mean by "Unstable snapshots"? How are you measuring > "unstable"? > > See [3]. Once I checked in those snapshots, to ensure the snapshot > generator are correct, I tend to regenerate it in CI and check the > snapshots are bit-to-bit identical. But it seems the snapshots listed > above (bf, kll, etc.) would change every time. Not sure it's the > sketches' feature or we use some random value in the generation > logics. > > [3] > https://github.com/tisonkun/datasketches-testsuite/actions/runs/20801578720/job/59747439700 > > Best, > tison. > > Lee Rhodes <[email protected]> 于2026年1月9日周五 10:15写道: > > > > Some thoughts and questions for clarification of your strategy: > > > > Is your plan to just use this repo for .sk files? I.e., just data. > > With shared code, we would have to partition sections of the repo for > different languages. > > Nonetheless, when we have more than one repo sharing the same language > (~4 for Java now), there is an opportunity to have a place here for shared > run-time code (e.g., Hash functions, common math functions, common > bit-twiddling code, etc). If we ever want to do this we might want a name > that is more neutral than "testsuite" > > > > Is your plan to choose one language to generate the .sk files? Then > what language? That means at least one language would have to implement > all sketches. Right now that would be either Java or C++. > > > > What do you mean by "Unstable snapshots"? How are you measuring > "unstable"? > > > > > > > > > > > > On Wed, Jan 7, 2026 at 5:02 PM tison <[email protected]> wrote: > >> > >> Looks like some sketches are unstable for generating. Not sure if we > >> should make them stable; or, if that's impossible, ignore the diff and > >> optionally check they are still logically compatible. > >> > >> The unstable snapshots are: > >> > >> * cpp_generated_files/bf_n0_h3_cpp.sk > >> * cpp_generated_files/bf_n0_h5_cpp.sk > >> * cpp_generated_files/bf_n10000_h3_cpp.sk > >> * cpp_generated_files/bf_n10000_h5_cpp.sk > >> * cpp_generated_files/bf_n2000000_h3_cpp.sk > >> * cpp_generated_files/bf_n2000000_h5_cpp.sk > >> * cpp_generated_files/bf_n30000000_h3_cpp.sk > >> * cpp_generated_files/bf_n30000000_h5_cpp.sk > >> * cpp_generated_files/kll_double_n1000000_cpp.sk > >> * cpp_generated_files/kll_double_n100000_cpp.sk > >> * cpp_generated_files/kll_double_n10000_cpp.sk > >> * cpp_generated_files/kll_double_n1000_cpp.sk > >> * cpp_generated_files/kll_float_n1000000_cpp.sk > >> * cpp_generated_files/kll_float_n100000_cpp.sk > >> * cpp_generated_files/kll_float_n10000_cpp.sk > >> * cpp_generated_files/kll_float_n1000_cpp.sk > >> * cpp_generated_files/kll_string_n1000000_cpp.sk > >> * cpp_generated_files/kll_string_n100000_cpp.sk > >> * cpp_generated_files/kll_string_n10000_cpp.sk > >> * cpp_generated_files/kll_string_n1000_cpp.sk > >> * cpp_generated_files/quantiles_double_n1000000_cpp.sk > >> * cpp_generated_files/quantiles_double_n100000_cpp.sk > >> * cpp_generated_files/quantiles_double_n10000_cpp.sk > >> * cpp_generated_files/quantiles_double_n1000_cpp.sk > >> * cpp_generated_files/quantiles_string_n1000000_cpp.sk > >> * cpp_generated_files/quantiles_string_n100000_cpp.sk > >> * cpp_generated_files/quantiles_string_n10000_cpp.sk > >> * cpp_generated_files/quantiles_string_n1000_cpp.sk > >> * cpp_generated_files/req_float_n1000000_cpp.sk > >> * cpp_generated_files/req_float_n100000_cpp.sk > >> * cpp_generated_files/req_float_n10000_cpp.sk > >> * cpp_generated_files/req_float_n1000_cpp.sk > >> * cpp_generated_files/varopt_sketch_long_n1000000_cpp.sk > >> * cpp_generated_files/varopt_sketch_long_n100000_cpp.sk > >> * cpp_generated_files/varopt_sketch_long_n10000_cpp.sk > >> * cpp_generated_files/varopt_sketch_long_n1000_cpp.sk > >> * cpp_generated_files/varopt_sketch_long_n100_cpp.sk > >> * cpp_generated_files/varopt_sketch_long_sampling_cpp.sk > >> * cpp_generated_files/varopt_union_double_sampling_cpp.sk > >> > >> * java_generated_files/bf_n0_h3_java.sk > >> * java_generated_files/bf_n0_h5_java.sk > >> * java_generated_files/bf_n10000_h3_java.sk > >> * java_generated_files/bf_n10000_h5_java.sk > >> * java_generated_files/bf_n2000000_h3_java.sk > >> * java_generated_files/bf_n2000000_h5_java.sk > >> * java_generated_files/bf_n30000000_h3_java.sk > >> * java_generated_files/bf_n30000000_h5_java.sk > >> * java_generated_files/kll_double_n1000000_java.sk > >> * java_generated_files/kll_double_n100000_java.sk > >> * java_generated_files/kll_double_n10000_java.sk > >> * java_generated_files/kll_double_n1000_java.sk > >> * java_generated_files/kll_float_n1000000_java.sk > >> * java_generated_files/kll_float_n100000_java.sk > >> * java_generated_files/kll_float_n10000_java.sk > >> * java_generated_files/kll_float_n1000_java.sk > >> * java_generated_files/kll_long_n1000000_java.sk > >> * java_generated_files/kll_long_n100000_java.sk > >> * java_generated_files/kll_long_n10000_java.sk > >> * java_generated_files/kll_long_n1000_java.sk > >> * java_generated_files/kll_string_n1000000_java.sk > >> * java_generated_files/kll_string_n100000_java.sk > >> * java_generated_files/kll_string_n10000_java.sk > >> * java_generated_files/kll_string_n1000_java.sk > >> * java_generated_files/quantiles_double_n1000000_java.sk > >> * java_generated_files/quantiles_double_n100000_java.sk > >> * java_generated_files/quantiles_double_n10000_java.sk > >> * java_generated_files/quantiles_double_n1000_java.sk > >> * java_generated_files/quantiles_string_n1000000_java.sk > >> * java_generated_files/quantiles_string_n100000_java.sk > >> * java_generated_files/quantiles_string_n10000_java.sk > >> * java_generated_files/quantiles_string_n1000_java.sk > >> * java_generated_files/req_float_n1000000_java.sk > >> * java_generated_files/req_float_n100000_java.sk > >> * java_generated_files/req_float_n10000_java.sk > >> * java_generated_files/req_float_n1000_java.sk > >> * java_generated_files/varopt_sketch_long_n1000000_java.sk > >> * java_generated_files/varopt_sketch_long_n100000_java.sk > >> * java_generated_files/varopt_sketch_long_n10000_java.sk > >> * java_generated_files/varopt_sketch_long_n1000_java.sk > >> * java_generated_files/varopt_sketch_long_n100_java.sk > >> * java_generated_files/varopt_sketch_long_sampling_java.sk > >> * java_generated_files/varopt_union_double_sampling_java.sk > >> > >> Best, > >> tison. > >> > >> tison <[email protected]> 于2026年1月8日周四 08:53写道: > >> > > >> > Hi, > >> > > >> > Following up on the discussion [1], I'd like to seek consensus to > >> > rename our existing but unused repo > >> > https://github.com/apache/datasketches-java-common to > >> > datasketches-testsuite to hold shared snapshot generator and > >> > (optional) serde tests. > >> > > >> > [1] https://github.com/apache/datasketches-rust/issues/10 > >> > > >> > Here is a repo link that would replace the current content [2]. It > contains: > >> > > >> > a. A script (gensnaps.py) to generate sketch snapshots for some > language impls. > >> > b. Checked-in snapshots that can guard the generator behavior, and for > >> > language serde tests to easily download the snaps instead of > >> > generating in place. > >> > c. (Optionally) Run some basic snap tests. Not included yet. > >> > > >> > [2] https://github.com/tisonkun/datasketches-testsuite > >> > > >> > If we can reach a consensus, I'll open an INFRA ticket to ask the > >> > INFRA team to do the rename. > >> > > >> > What do you think? > >> > > >> > Best, > >> > tison. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
