The upcoming 9.0.0 release of the DataSketches-java library is providing an opportunity to clean up and refactor some things in the existing code that should make the code base easier to understand and use.
Some of the code has been around for over 10 years and the library has changed considerably since then. This note is an early heads-up into some of the major changes coming to the library. A more complete summary of the changes will be published at the time of release. - Java 25 is the first Java LTS release of the FFM API, and provides an amazing opportunity for us to obsolete the entire DataSketches-Memory repository. The DS-Memory repo has served us well for over 10 years by providing fast access to off-heap data structures. But FFM now provides that capability directly as part of the Java language. And, as a result, DS-Memory will be completely replaced with FFM in the DS-Java 9.0.0 release. Our backward compatibility promise is primarily focused on the binary compatibility of being able to read serializations of earlier versions of the code with our latest versions. This is still true. However, up until recently we have been supporting binary compatibility with very early code versions at Yahoo that existed long before our code was first open-sourced in August of 2015 (we became part of ASF in 2020). Since Yahoo is in the process of obsoleting those systems that used some of that code there is no reason that we need to keep supporting it. - With DS-Java 9.0.0, we plan to remove the complex code that allowed us to read that old code. If someone, for some reason, should need to read sketches serialized prior to August of 2015, it is still possible to do so with DS-java versions prior to 9.0.0, e.g., 8.0.0. Those sketches can then be re-serialized, which will be able to be read with DS-Java 9.0.0 and beyond. The Theta Sketch family was our first family of sketches that we developed, and as a result it has some peculiarities. For example, the root class is called "Sketch" (because it was the only one). As we developed other sketch families, we made the root class names carry more meaningful information, e.g., HllSketch, CpcSketch, KllSketch, etc. - With DS-Java 9.0.0, we will be bringing the Theta Sketch family up to the same standard as the other sketches, so the root class will be called ThetaSketch, and the names of its child classes will be named accordingly. The Theta Sketch family is one of the largest families of sketches with many classes, many of which just "evolved" -- and unfortunately, with a lot of duplication and incomplete coverage. It is hard to understand all the relationships and where to find the function you might be looking for. - With DS-Java 9.0.0, we will try to remove much of the duplication and provide some better documentation to help you find your way around. We would certainly like to hear from you.
