> Is your plan to just use this repo for .sk files?  I.e., just data.

Nope. As you can see in [1], the snapshots will be checked in, the
(generator) script would be checked in as well.

[1] https://github.com/tisonkun/datasketches-testsuite

> Is your plan to choose one language to generate the .sk files?  Then what 
> language?  That means at least one language would have to implement all 
> sketches.  Right now that would be either Java or C++.

Again, as shown in [1], I tend to collect snapshots in all languages,
so that they can be checked mutually:

serialization_test_data/cpp_generated_files
serialization_test_data/java_generated_files
(further) serialization_test_data/go_generated_files
(further) serialization_test_data/rust_generated_files

Each language impl should have their own generator logics. So far, I
glue those logics with a python script [2]. In rust, we can provide a
binary target "gensnaps" to generate snapshots.

[2] https://github.com/tisonkun/datasketches-testsuite/blob/main/gensnaps.py

That is, instead of having one impl for all sketches, in
datasketches-testsuite, each impl provides the sketches snapshots they
implement for other impl to test over.

> What do you mean by "Unstable snapshots"?  How are you measuring "unstable"?

See [3]. Once I checked in those snapshots, to ensure the snapshot
generator are correct, I tend to regenerate it in CI and check the
snapshots are bit-to-bit identical. But it seems the snapshots listed
above (bf, kll, etc.) would change every time. Not sure it's the
sketches' feature or we use some random value in the generation
logics.

[3] 
https://github.com/tisonkun/datasketches-testsuite/actions/runs/20801578720/job/59747439700

Best,
tison.

Lee Rhodes <[email protected]> 于2026年1月9日周五 10:15写道:
>
> Some thoughts and questions for clarification of your strategy:
>
> Is your plan to just use this repo for .sk files?  I.e., just data.
> With shared code, we would have to partition sections of the repo for 
> different languages.
> Nonetheless, when we have more than one repo sharing the same language (~4 
> for Java now), there is an opportunity to have a place here for shared 
> run-time code (e.g., Hash functions, common math functions, common 
> bit-twiddling code, etc).  If we ever want to do this we might want a name 
> that is more neutral than "testsuite"
>
> Is your plan to choose one language to generate the .sk files?  Then what 
> language?  That means at least one language would have to implement all 
> sketches.  Right now that would be either Java or C++.
>
> What do you mean by "Unstable snapshots"?  How are you measuring "unstable"?
>
>
>
>
>
> On Wed, Jan 7, 2026 at 5:02 PM tison <[email protected]> wrote:
>>
>> Looks like some sketches are unstable for generating. Not sure if we
>> should make them stable; or, if that's impossible, ignore the diff and
>> optionally check they are still logically compatible.
>>
>> The unstable snapshots are:
>>
>> * cpp_generated_files/bf_n0_h3_cpp.sk
>> * cpp_generated_files/bf_n0_h5_cpp.sk
>> * cpp_generated_files/bf_n10000_h3_cpp.sk
>> * cpp_generated_files/bf_n10000_h5_cpp.sk
>> * cpp_generated_files/bf_n2000000_h3_cpp.sk
>> * cpp_generated_files/bf_n2000000_h5_cpp.sk
>> * cpp_generated_files/bf_n30000000_h3_cpp.sk
>> * cpp_generated_files/bf_n30000000_h5_cpp.sk
>> * cpp_generated_files/kll_double_n1000000_cpp.sk
>> * cpp_generated_files/kll_double_n100000_cpp.sk
>> * cpp_generated_files/kll_double_n10000_cpp.sk
>> * cpp_generated_files/kll_double_n1000_cpp.sk
>> * cpp_generated_files/kll_float_n1000000_cpp.sk
>> * cpp_generated_files/kll_float_n100000_cpp.sk
>> * cpp_generated_files/kll_float_n10000_cpp.sk
>> * cpp_generated_files/kll_float_n1000_cpp.sk
>> * cpp_generated_files/kll_string_n1000000_cpp.sk
>> * cpp_generated_files/kll_string_n100000_cpp.sk
>> * cpp_generated_files/kll_string_n10000_cpp.sk
>> * cpp_generated_files/kll_string_n1000_cpp.sk
>> * cpp_generated_files/quantiles_double_n1000000_cpp.sk
>> * cpp_generated_files/quantiles_double_n100000_cpp.sk
>> * cpp_generated_files/quantiles_double_n10000_cpp.sk
>> * cpp_generated_files/quantiles_double_n1000_cpp.sk
>> * cpp_generated_files/quantiles_string_n1000000_cpp.sk
>> * cpp_generated_files/quantiles_string_n100000_cpp.sk
>> * cpp_generated_files/quantiles_string_n10000_cpp.sk
>> * cpp_generated_files/quantiles_string_n1000_cpp.sk
>> * cpp_generated_files/req_float_n1000000_cpp.sk
>> * cpp_generated_files/req_float_n100000_cpp.sk
>> * cpp_generated_files/req_float_n10000_cpp.sk
>> * cpp_generated_files/req_float_n1000_cpp.sk
>> * cpp_generated_files/varopt_sketch_long_n1000000_cpp.sk
>> * cpp_generated_files/varopt_sketch_long_n100000_cpp.sk
>> * cpp_generated_files/varopt_sketch_long_n10000_cpp.sk
>> * cpp_generated_files/varopt_sketch_long_n1000_cpp.sk
>> * cpp_generated_files/varopt_sketch_long_n100_cpp.sk
>> * cpp_generated_files/varopt_sketch_long_sampling_cpp.sk
>> * cpp_generated_files/varopt_union_double_sampling_cpp.sk
>>
>> * java_generated_files/bf_n0_h3_java.sk
>> * java_generated_files/bf_n0_h5_java.sk
>> * java_generated_files/bf_n10000_h3_java.sk
>> * java_generated_files/bf_n10000_h5_java.sk
>> * java_generated_files/bf_n2000000_h3_java.sk
>> * java_generated_files/bf_n2000000_h5_java.sk
>> * java_generated_files/bf_n30000000_h3_java.sk
>> * java_generated_files/bf_n30000000_h5_java.sk
>> * java_generated_files/kll_double_n1000000_java.sk
>> * java_generated_files/kll_double_n100000_java.sk
>> * java_generated_files/kll_double_n10000_java.sk
>> * java_generated_files/kll_double_n1000_java.sk
>> * java_generated_files/kll_float_n1000000_java.sk
>> * java_generated_files/kll_float_n100000_java.sk
>> * java_generated_files/kll_float_n10000_java.sk
>> * java_generated_files/kll_float_n1000_java.sk
>> * java_generated_files/kll_long_n1000000_java.sk
>> * java_generated_files/kll_long_n100000_java.sk
>> * java_generated_files/kll_long_n10000_java.sk
>> * java_generated_files/kll_long_n1000_java.sk
>> * java_generated_files/kll_string_n1000000_java.sk
>> * java_generated_files/kll_string_n100000_java.sk
>> * java_generated_files/kll_string_n10000_java.sk
>> * java_generated_files/kll_string_n1000_java.sk
>> * java_generated_files/quantiles_double_n1000000_java.sk
>> * java_generated_files/quantiles_double_n100000_java.sk
>> * java_generated_files/quantiles_double_n10000_java.sk
>> * java_generated_files/quantiles_double_n1000_java.sk
>> * java_generated_files/quantiles_string_n1000000_java.sk
>> * java_generated_files/quantiles_string_n100000_java.sk
>> * java_generated_files/quantiles_string_n10000_java.sk
>> * java_generated_files/quantiles_string_n1000_java.sk
>> * java_generated_files/req_float_n1000000_java.sk
>> * java_generated_files/req_float_n100000_java.sk
>> * java_generated_files/req_float_n10000_java.sk
>> * java_generated_files/req_float_n1000_java.sk
>> * java_generated_files/varopt_sketch_long_n1000000_java.sk
>> * java_generated_files/varopt_sketch_long_n100000_java.sk
>> * java_generated_files/varopt_sketch_long_n10000_java.sk
>> * java_generated_files/varopt_sketch_long_n1000_java.sk
>> * java_generated_files/varopt_sketch_long_n100_java.sk
>> * java_generated_files/varopt_sketch_long_sampling_java.sk
>> * java_generated_files/varopt_union_double_sampling_java.sk
>>
>> Best,
>> tison.
>>
>> tison <[email protected]> 于2026年1月8日周四 08:53写道:
>> >
>> > Hi,
>> >
>> > Following up on the discussion [1], I'd like to seek consensus to
>> > rename our existing but unused repo
>> > https://github.com/apache/datasketches-java-common to
>> > datasketches-testsuite to hold shared snapshot generator and
>> > (optional) serde tests.
>> >
>> > [1] https://github.com/apache/datasketches-rust/issues/10
>> >
>> > Here is a repo link that would replace the current content [2]. It 
>> > contains:
>> >
>> > a. A script (gensnaps.py) to generate sketch snapshots for some language 
>> > impls.
>> > b. Checked-in snapshots that can guard the generator behavior, and for
>> > language serde tests to easily download the snaps instead of
>> > generating in place.
>> > c. (Optionally) Run some basic snap tests. Not included yet.
>> >
>> > [2] https://github.com/tisonkun/datasketches-testsuite
>> >
>> > If we can reach a consensus, I'll open an INFRA ticket to ask the
>> > INFRA team to do the rename.
>> >
>> > What do you think?
>> >
>> > Best,
>> > tison.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to