See my missive: https://github.com/apache/datasketches-rust/issues/10

On Thu, Jan 8, 2026 at 7:24 PM tison <[email protected]> wrote:

> > Is your plan to just use this repo for .sk files?  I.e., just data.
>
> Nope. As you can see in [1], the snapshots will be checked in, the
> (generator) script would be checked in as well.
>
> [1] https://github.com/tisonkun/datasketches-testsuite
>
> > Is your plan to choose one language to generate the .sk files?  Then
> what language?  That means at least one language would have to implement
> all sketches.  Right now that would be either Java or C++.
>
> Again, as shown in [1], I tend to collect snapshots in all languages,
> so that they can be checked mutually:
>
> serialization_test_data/cpp_generated_files
> serialization_test_data/java_generated_files
> (further) serialization_test_data/go_generated_files
> (further) serialization_test_data/rust_generated_files
>
> Each language impl should have their own generator logics. So far, I
> glue those logics with a python script [2]. In rust, we can provide a
> binary target "gensnaps" to generate snapshots.
>
> [2]
> https://github.com/tisonkun/datasketches-testsuite/blob/main/gensnaps.py
>
> That is, instead of having one impl for all sketches, in
> datasketches-testsuite, each impl provides the sketches snapshots they
> implement for other impl to test over.
>
> > What do you mean by "Unstable snapshots"?  How are you measuring
> "unstable"?
>
> See [3]. Once I checked in those snapshots, to ensure the snapshot
> generator are correct, I tend to regenerate it in CI and check the
> snapshots are bit-to-bit identical. But it seems the snapshots listed
> above (bf, kll, etc.) would change every time. Not sure it's the
> sketches' feature or we use some random value in the generation
> logics.
>
> [3]
> https://github.com/tisonkun/datasketches-testsuite/actions/runs/20801578720/job/59747439700
>
> Best,
> tison.
>
> Lee Rhodes <[email protected]> 于2026年1月9日周五 10:15写道:
> >
> > Some thoughts and questions for clarification of your strategy:
> >
> > Is your plan to just use this repo for .sk files?  I.e., just data.
> > With shared code, we would have to partition sections of the repo for
> different languages.
> > Nonetheless, when we have more than one repo sharing the same language
> (~4 for Java now), there is an opportunity to have a place here for shared
> run-time code (e.g., Hash functions, common math functions, common
> bit-twiddling code, etc).  If we ever want to do this we might want a name
> that is more neutral than "testsuite"
> >
> > Is your plan to choose one language to generate the .sk files?  Then
> what language?  That means at least one language would have to implement
> all sketches.  Right now that would be either Java or C++.
> >
> > What do you mean by "Unstable snapshots"?  How are you measuring
> "unstable"?
> >
> >
> >
> >
> >
> > On Wed, Jan 7, 2026 at 5:02 PM tison <[email protected]> wrote:
> >>
> >> Looks like some sketches are unstable for generating. Not sure if we
> >> should make them stable; or, if that's impossible, ignore the diff and
> >> optionally check they are still logically compatible.
> >>
> >> The unstable snapshots are:
> >>
> >> * cpp_generated_files/bf_n0_h3_cpp.sk
> >> * cpp_generated_files/bf_n0_h5_cpp.sk
> >> * cpp_generated_files/bf_n10000_h3_cpp.sk
> >> * cpp_generated_files/bf_n10000_h5_cpp.sk
> >> * cpp_generated_files/bf_n2000000_h3_cpp.sk
> >> * cpp_generated_files/bf_n2000000_h5_cpp.sk
> >> * cpp_generated_files/bf_n30000000_h3_cpp.sk
> >> * cpp_generated_files/bf_n30000000_h5_cpp.sk
> >> * cpp_generated_files/kll_double_n1000000_cpp.sk
> >> * cpp_generated_files/kll_double_n100000_cpp.sk
> >> * cpp_generated_files/kll_double_n10000_cpp.sk
> >> * cpp_generated_files/kll_double_n1000_cpp.sk
> >> * cpp_generated_files/kll_float_n1000000_cpp.sk
> >> * cpp_generated_files/kll_float_n100000_cpp.sk
> >> * cpp_generated_files/kll_float_n10000_cpp.sk
> >> * cpp_generated_files/kll_float_n1000_cpp.sk
> >> * cpp_generated_files/kll_string_n1000000_cpp.sk
> >> * cpp_generated_files/kll_string_n100000_cpp.sk
> >> * cpp_generated_files/kll_string_n10000_cpp.sk
> >> * cpp_generated_files/kll_string_n1000_cpp.sk
> >> * cpp_generated_files/quantiles_double_n1000000_cpp.sk
> >> * cpp_generated_files/quantiles_double_n100000_cpp.sk
> >> * cpp_generated_files/quantiles_double_n10000_cpp.sk
> >> * cpp_generated_files/quantiles_double_n1000_cpp.sk
> >> * cpp_generated_files/quantiles_string_n1000000_cpp.sk
> >> * cpp_generated_files/quantiles_string_n100000_cpp.sk
> >> * cpp_generated_files/quantiles_string_n10000_cpp.sk
> >> * cpp_generated_files/quantiles_string_n1000_cpp.sk
> >> * cpp_generated_files/req_float_n1000000_cpp.sk
> >> * cpp_generated_files/req_float_n100000_cpp.sk
> >> * cpp_generated_files/req_float_n10000_cpp.sk
> >> * cpp_generated_files/req_float_n1000_cpp.sk
> >> * cpp_generated_files/varopt_sketch_long_n1000000_cpp.sk
> >> * cpp_generated_files/varopt_sketch_long_n100000_cpp.sk
> >> * cpp_generated_files/varopt_sketch_long_n10000_cpp.sk
> >> * cpp_generated_files/varopt_sketch_long_n1000_cpp.sk
> >> * cpp_generated_files/varopt_sketch_long_n100_cpp.sk
> >> * cpp_generated_files/varopt_sketch_long_sampling_cpp.sk
> >> * cpp_generated_files/varopt_union_double_sampling_cpp.sk
> >>
> >> * java_generated_files/bf_n0_h3_java.sk
> >> * java_generated_files/bf_n0_h5_java.sk
> >> * java_generated_files/bf_n10000_h3_java.sk
> >> * java_generated_files/bf_n10000_h5_java.sk
> >> * java_generated_files/bf_n2000000_h3_java.sk
> >> * java_generated_files/bf_n2000000_h5_java.sk
> >> * java_generated_files/bf_n30000000_h3_java.sk
> >> * java_generated_files/bf_n30000000_h5_java.sk
> >> * java_generated_files/kll_double_n1000000_java.sk
> >> * java_generated_files/kll_double_n100000_java.sk
> >> * java_generated_files/kll_double_n10000_java.sk
> >> * java_generated_files/kll_double_n1000_java.sk
> >> * java_generated_files/kll_float_n1000000_java.sk
> >> * java_generated_files/kll_float_n100000_java.sk
> >> * java_generated_files/kll_float_n10000_java.sk
> >> * java_generated_files/kll_float_n1000_java.sk
> >> * java_generated_files/kll_long_n1000000_java.sk
> >> * java_generated_files/kll_long_n100000_java.sk
> >> * java_generated_files/kll_long_n10000_java.sk
> >> * java_generated_files/kll_long_n1000_java.sk
> >> * java_generated_files/kll_string_n1000000_java.sk
> >> * java_generated_files/kll_string_n100000_java.sk
> >> * java_generated_files/kll_string_n10000_java.sk
> >> * java_generated_files/kll_string_n1000_java.sk
> >> * java_generated_files/quantiles_double_n1000000_java.sk
> >> * java_generated_files/quantiles_double_n100000_java.sk
> >> * java_generated_files/quantiles_double_n10000_java.sk
> >> * java_generated_files/quantiles_double_n1000_java.sk
> >> * java_generated_files/quantiles_string_n1000000_java.sk
> >> * java_generated_files/quantiles_string_n100000_java.sk
> >> * java_generated_files/quantiles_string_n10000_java.sk
> >> * java_generated_files/quantiles_string_n1000_java.sk
> >> * java_generated_files/req_float_n1000000_java.sk
> >> * java_generated_files/req_float_n100000_java.sk
> >> * java_generated_files/req_float_n10000_java.sk
> >> * java_generated_files/req_float_n1000_java.sk
> >> * java_generated_files/varopt_sketch_long_n1000000_java.sk
> >> * java_generated_files/varopt_sketch_long_n100000_java.sk
> >> * java_generated_files/varopt_sketch_long_n10000_java.sk
> >> * java_generated_files/varopt_sketch_long_n1000_java.sk
> >> * java_generated_files/varopt_sketch_long_n100_java.sk
> >> * java_generated_files/varopt_sketch_long_sampling_java.sk
> >> * java_generated_files/varopt_union_double_sampling_java.sk
> >>
> >> Best,
> >> tison.
> >>
> >> tison <[email protected]> 于2026年1月8日周四 08:53写道:
> >> >
> >> > Hi,
> >> >
> >> > Following up on the discussion [1], I'd like to seek consensus to
> >> > rename our existing but unused repo
> >> > https://github.com/apache/datasketches-java-common to
> >> > datasketches-testsuite to hold shared snapshot generator and
> >> > (optional) serde tests.
> >> >
> >> > [1] https://github.com/apache/datasketches-rust/issues/10
> >> >
> >> > Here is a repo link that would replace the current content [2]. It
> contains:
> >> >
> >> > a. A script (gensnaps.py) to generate sketch snapshots for some
> language impls.
> >> > b. Checked-in snapshots that can guard the generator behavior, and for
> >> > language serde tests to easily download the snaps instead of
> >> > generating in place.
> >> > c. (Optionally) Run some basic snap tests. Not included yet.
> >> >
> >> > [2] https://github.com/tisonkun/datasketches-testsuite
> >> >
> >> > If we can reach a consensus, I'll open an INFRA ticket to ask the
> >> > INFRA team to do the rename.
> >> >
> >> > What do you think?
> >> >
> >> > Best,
> >> > tison.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to