Thanks!

On Sat, Mar 2, 2024 at 2:23 PM freakyzoidberg (via GitHub) <[email protected]>
wrote:

>
> freakyzoidberg commented on code in PR #163:
> URL:
> https://github.com/apache/datasketches-website/pull/163#discussion_r1510085541
>
>
> ##########
> docs/Architecture/LargeScale.md:
> ##########
> @@ -21,20 +21,47 @@ layout: doc_page
>  -->
>  ## Designed for Large-scale Computing Systems
>
> +#### Multiple Languages
> +
> +* The DataSketches library is now available in three languages, Java,
> C++, and Python. A forth language, GoLang, is in development.
> +
> +
> +### Compatibility Across Languages, Software Versions And Binary
> Serialization Versions
> +Large-scale computing environments may have a mix of various platforms
> utilizing different programming languages each with the possiblity of using
> different Software Versions of our DataSketches library.  Cross version
> compatibility of software is a challenge that all platforms face in
> general, and it is up to the platform maintainers to keep their software
> up-to-date. This not new and not different with the DataSketches library.
> +
> +Nonetheless, it our goal to strive to make it as easy as practically
> possible to serialize our sketches in one of our supported languages on one
> platform and to be deserialized in a different supported language,
> potentially on a different, even remote platform, and perhaps much later in
> time.
> +
> +With this goal in mind, here are some of the key strategic decisions we
> have made in the development of the DataSketches library.
> +
> +#### Two levels of versioning.
> +
> +* **Software Version:** This is the release version, published via
> Apache.org and specified in the POM file or equivalent. This can change
> relatively frequently based on bug fixes and introduction of new
> capabilities. We follow the principles of *Semantic Versioning* as
> specified by [semver.org](https://semver.org).
> +
> +* **Serialization Version:** (*SerVer*) This is a small integer placed in
> the preamble of the serialized byte array that indicates the version of the
> serialized structure for the sketch. This is very similar to Java's [*Class
> File Format Version*](https://en.wikipedia.org/wiki/Java_class_file). A
> single *SerVer* may represent multiple structures all based on the same
> sketch when stored in different states, e.g., *Single Item*, *Compact*,
> *Updatable*, etc). This *SerVer* changes very rarely, if at all. Of all of
> our sketches, only a few, e.g., Theta, KLL and Sampling, have had more than
> one *SerVer* over time. There are and will be many *Software Versions* of
> the same sketch that still use the same *SerVer*. When we have to update
> the *SerVer*, we provide the capability in the *Software Version* of the
> code associated with the new *SerVer* the ability to read and convert the
> old *SerVer* to the new *SerVer*. This is why our newest *Software
> Versions* can still read and interpret olde
>  r *SerVer* serialized sketches that go back to when our project was
> started at Yahoo (2012), and before we went open-source (2015). Technically
> speaking this can be described as *Backward-Transient* compatibility
> [Schema Evolution and Compatibility](
> https://docs.confluent.io/platform/current/schema-registry/fundamentals/schema-evolution.html)
> and [Schema Evolution](https://en.wikipedia.org/wiki/Schema_evolution).
> +
> +From the user's perspective, as long as the *SerVer* is the same, older
> *Software Versions* should be able to read sketch images created by newer
> *Software Versions*. But the APIs may be different, obviously. An older
> *Software Version* will not be able to take advantage of new features
> introduced in new *Software Versions*, but it should be able to do what it
> did before. In other words, there will be no loss of access to the
> serialized sketch and the older *Software Version* capabilities. A user
> should not need to access the *SerVer*, nonetheless it is always stored in
> index one of the serialized image. If a sketch is presented with a *SerVer*
> that it is not compatible with, the sketch should throw an exception and
> say what the problem is, just like Java does with its *Class File Format
> Versions*.
> +
> +#### The Serialized Image of a Sketch
> +* The structure (or image) of a serialized sketch is independent of the
> language from which it was created.
> +* The sketch image only contains little-endian primitives, such as int64,
> int32, int16, int8, double-64, float-32, UTF-8 strings, and simple array
> structures of those, which can be easily interpreted in many languages on
> modern CPUs. We do not support big-endian serialization.
> +* The sketch image is unique for each type of sketch.
> +* Simply speaking, a sketch image can be viewed as a blob of bytes, which
> is easily stored and easily transported using many different protocols,
> including Protobuf, Avro, Thrift, Byte64, etc.
> +
>
> Review Comment:
>    Should we clarify that for some sketches (FrequencyItemSketch iirc) the
> serialised form between language may not be strictly equal but still be
> logically equivalent?
>
>
>
>
>
> ##########
> docs/Architecture/LargeScale.md:
> ##########
> @@ -21,20 +21,47 @@ layout: doc_page
>  -->
>  ## Designed for Large-scale Computing Systems
>
> +#### Multiple Languages
> +
> +* The DataSketches library is now available in three languages, Java,
> C++, and Python. A forth language, GoLang, is in development.
>
> Review Comment:
>    `A fourth language, Go, is in development.`
>
>    Typo in forth/fourth
>    GoLang is the former name of Go https://go.dev/doc/faq#go_or_golang
>
>
>
> --
> This is an automated message from the Apache Git Service.
> To respond to the message, please log on to GitHub and use the
> URL above to go to the specific comment.
>
> To unsubscribe, e-mail: [email protected]
>
> For queries about this service, please contact Infrastructure at:
> [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to