Except for the static vs dynamic storage, I don't think the proposal I have
above is too different from yours.  At least the capabilities each language
needs to provide are roughly the same.  Perhaps I have thought of a few
things you haven't listed and visa versa.

Detecting bitwise equality is not going to work as I explain above.

On Sun, Jan 11, 2026 at 2:05 PM Lee Rhodes <[email protected]> wrote:

> See my missive: https://github.com/apache/datasketches-rust/issues/10
>
> On Thu, Jan 8, 2026 at 7:24 PM tison <[email protected]> wrote:
>
>> > Is your plan to just use this repo for .sk files?  I.e., just data.
>>
>> Nope. As you can see in [1], the snapshots will be checked in, the
>> (generator) script would be checked in as well.
>>
>> [1] https://github.com/tisonkun/datasketches-testsuite
>>
>> > Is your plan to choose one language to generate the .sk files?  Then
>> what language?  That means at least one language would have to implement
>> all sketches.  Right now that would be either Java or C++.
>>
>> Again, as shown in [1], I tend to collect snapshots in all languages,
>> so that they can be checked mutually:
>>
>> serialization_test_data/cpp_generated_files
>> serialization_test_data/java_generated_files
>> (further) serialization_test_data/go_generated_files
>> (further) serialization_test_data/rust_generated_files
>>
>> Each language impl should have their own generator logics. So far, I
>> glue those logics with a python script [2]. In rust, we can provide a
>> binary target "gensnaps" to generate snapshots.
>>
>> [2]
>> https://github.com/tisonkun/datasketches-testsuite/blob/main/gensnaps.py
>>
>> That is, instead of having one impl for all sketches, in
>> datasketches-testsuite, each impl provides the sketches snapshots they
>> implement for other impl to test over.
>>
>> > What do you mean by "Unstable snapshots"?  How are you measuring
>> "unstable"?
>>
>> See [3]. Once I checked in those snapshots, to ensure the snapshot
>> generator are correct, I tend to regenerate it in CI and check the
>> snapshots are bit-to-bit identical. But it seems the snapshots listed
>> above (bf, kll, etc.) would change every time. Not sure it's the
>> sketches' feature or we use some random value in the generation
>> logics.
>>
>> [3]
>> https://github.com/tisonkun/datasketches-testsuite/actions/runs/20801578720/job/59747439700
>>
>> Best,
>> tison.
>>
>> Lee Rhodes <[email protected]> 于2026年1月9日周五 10:15写道:
>> >
>> > Some thoughts and questions for clarification of your strategy:
>> >
>> > Is your plan to just use this repo for .sk files?  I.e., just data.
>> > With shared code, we would have to partition sections of the repo for
>> different languages.
>> > Nonetheless, when we have more than one repo sharing the same language
>> (~4 for Java now), there is an opportunity to have a place here for shared
>> run-time code (e.g., Hash functions, common math functions, common
>> bit-twiddling code, etc).  If we ever want to do this we might want a name
>> that is more neutral than "testsuite"
>> >
>> > Is your plan to choose one language to generate the .sk files?  Then
>> what language?  That means at least one language would have to implement
>> all sketches.  Right now that would be either Java or C++.
>> >
>> > What do you mean by "Unstable snapshots"?  How are you measuring
>> "unstable"?
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Jan 7, 2026 at 5:02 PM tison <[email protected]> wrote:
>> >>
>> >> Looks like some sketches are unstable for generating. Not sure if we
>> >> should make them stable; or, if that's impossible, ignore the diff and
>> >> optionally check they are still logically compatible.
>> >>
>> >> The unstable snapshots are:
>> >>
>> >> * cpp_generated_files/bf_n0_h3_cpp.sk
>> >> * cpp_generated_files/bf_n0_h5_cpp.sk
>> >> * cpp_generated_files/bf_n10000_h3_cpp.sk
>> >> * cpp_generated_files/bf_n10000_h5_cpp.sk
>> >> * cpp_generated_files/bf_n2000000_h3_cpp.sk
>> >> * cpp_generated_files/bf_n2000000_h5_cpp.sk
>> >> * cpp_generated_files/bf_n30000000_h3_cpp.sk
>> >> * cpp_generated_files/bf_n30000000_h5_cpp.sk
>> >> * cpp_generated_files/kll_double_n1000000_cpp.sk
>> >> * cpp_generated_files/kll_double_n100000_cpp.sk
>> >> * cpp_generated_files/kll_double_n10000_cpp.sk
>> >> * cpp_generated_files/kll_double_n1000_cpp.sk
>> >> * cpp_generated_files/kll_float_n1000000_cpp.sk
>> >> * cpp_generated_files/kll_float_n100000_cpp.sk
>> >> * cpp_generated_files/kll_float_n10000_cpp.sk
>> >> * cpp_generated_files/kll_float_n1000_cpp.sk
>> >> * cpp_generated_files/kll_string_n1000000_cpp.sk
>> >> * cpp_generated_files/kll_string_n100000_cpp.sk
>> >> * cpp_generated_files/kll_string_n10000_cpp.sk
>> >> * cpp_generated_files/kll_string_n1000_cpp.sk
>> >> * cpp_generated_files/quantiles_double_n1000000_cpp.sk
>> >> * cpp_generated_files/quantiles_double_n100000_cpp.sk
>> >> * cpp_generated_files/quantiles_double_n10000_cpp.sk
>> >> * cpp_generated_files/quantiles_double_n1000_cpp.sk
>> >> * cpp_generated_files/quantiles_string_n1000000_cpp.sk
>> >> * cpp_generated_files/quantiles_string_n100000_cpp.sk
>> >> * cpp_generated_files/quantiles_string_n10000_cpp.sk
>> >> * cpp_generated_files/quantiles_string_n1000_cpp.sk
>> >> * cpp_generated_files/req_float_n1000000_cpp.sk
>> >> * cpp_generated_files/req_float_n100000_cpp.sk
>> >> * cpp_generated_files/req_float_n10000_cpp.sk
>> >> * cpp_generated_files/req_float_n1000_cpp.sk
>> >> * cpp_generated_files/varopt_sketch_long_n1000000_cpp.sk
>> >> * cpp_generated_files/varopt_sketch_long_n100000_cpp.sk
>> >> * cpp_generated_files/varopt_sketch_long_n10000_cpp.sk
>> >> * cpp_generated_files/varopt_sketch_long_n1000_cpp.sk
>> >> * cpp_generated_files/varopt_sketch_long_n100_cpp.sk
>> >> * cpp_generated_files/varopt_sketch_long_sampling_cpp.sk
>> >> * cpp_generated_files/varopt_union_double_sampling_cpp.sk
>> >>
>> >> * java_generated_files/bf_n0_h3_java.sk
>> >> * java_generated_files/bf_n0_h5_java.sk
>> >> * java_generated_files/bf_n10000_h3_java.sk
>> >> * java_generated_files/bf_n10000_h5_java.sk
>> >> * java_generated_files/bf_n2000000_h3_java.sk
>> >> * java_generated_files/bf_n2000000_h5_java.sk
>> >> * java_generated_files/bf_n30000000_h3_java.sk
>> >> * java_generated_files/bf_n30000000_h5_java.sk
>> >> * java_generated_files/kll_double_n1000000_java.sk
>> >> * java_generated_files/kll_double_n100000_java.sk
>> >> * java_generated_files/kll_double_n10000_java.sk
>> >> * java_generated_files/kll_double_n1000_java.sk
>> >> * java_generated_files/kll_float_n1000000_java.sk
>> >> * java_generated_files/kll_float_n100000_java.sk
>> >> * java_generated_files/kll_float_n10000_java.sk
>> >> * java_generated_files/kll_float_n1000_java.sk
>> >> * java_generated_files/kll_long_n1000000_java.sk
>> >> * java_generated_files/kll_long_n100000_java.sk
>> >> * java_generated_files/kll_long_n10000_java.sk
>> >> * java_generated_files/kll_long_n1000_java.sk
>> >> * java_generated_files/kll_string_n1000000_java.sk
>> >> * java_generated_files/kll_string_n100000_java.sk
>> >> * java_generated_files/kll_string_n10000_java.sk
>> >> * java_generated_files/kll_string_n1000_java.sk
>> >> * java_generated_files/quantiles_double_n1000000_java.sk
>> >> * java_generated_files/quantiles_double_n100000_java.sk
>> >> * java_generated_files/quantiles_double_n10000_java.sk
>> >> * java_generated_files/quantiles_double_n1000_java.sk
>> >> * java_generated_files/quantiles_string_n1000000_java.sk
>> >> * java_generated_files/quantiles_string_n100000_java.sk
>> >> * java_generated_files/quantiles_string_n10000_java.sk
>> >> * java_generated_files/quantiles_string_n1000_java.sk
>> >> * java_generated_files/req_float_n1000000_java.sk
>> >> * java_generated_files/req_float_n100000_java.sk
>> >> * java_generated_files/req_float_n10000_java.sk
>> >> * java_generated_files/req_float_n1000_java.sk
>> >> * java_generated_files/varopt_sketch_long_n1000000_java.sk
>> >> * java_generated_files/varopt_sketch_long_n100000_java.sk
>> >> * java_generated_files/varopt_sketch_long_n10000_java.sk
>> >> * java_generated_files/varopt_sketch_long_n1000_java.sk
>> >> * java_generated_files/varopt_sketch_long_n100_java.sk
>> >> * java_generated_files/varopt_sketch_long_sampling_java.sk
>> >> * java_generated_files/varopt_union_double_sampling_java.sk
>> >>
>> >> Best,
>> >> tison.
>> >>
>> >> tison <[email protected]> 于2026年1月8日周四 08:53写道:
>> >> >
>> >> > Hi,
>> >> >
>> >> > Following up on the discussion [1], I'd like to seek consensus to
>> >> > rename our existing but unused repo
>> >> > https://github.com/apache/datasketches-java-common to
>> >> > datasketches-testsuite to hold shared snapshot generator and
>> >> > (optional) serde tests.
>> >> >
>> >> > [1] https://github.com/apache/datasketches-rust/issues/10
>> >> >
>> >> > Here is a repo link that would replace the current content [2]. It
>> contains:
>> >> >
>> >> > a. A script (gensnaps.py) to generate sketch snapshots for some
>> language impls.
>> >> > b. Checked-in snapshots that can guard the generator behavior, and
>> for
>> >> > language serde tests to easily download the snaps instead of
>> >> > generating in place.
>> >> > c. (Optionally) Run some basic snap tests. Not included yet.
>> >> >
>> >> > [2] https://github.com/tisonkun/datasketches-testsuite
>> >> >
>> >> > If we can reach a consensus, I'll open an INFRA ticket to ask the
>> >> > INFRA team to do the rename.
>> >> >
>> >> > What do you think?
>> >> >
>> >> > Best,
>> >> > tison.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

Reply via email to