leerho commented on code in PR #162:
URL:
https://github.com/apache/datasketches-website/pull/162#discussion_r1509520250
##########
docs/Architecture/LargeScale.md:
##########
@@ -33,7 +33,9 @@ layout: doc_page
* The C++ Core is extended using the python binding library
[pybind11](https://github.com/pybind/pybind11) enabling high performance
operation from Python.
### Cross Language Binary Compatibility
-* Sketches serialized from C++ or Python can be interpreted by compatible Java
sketches and visa versa.
+* Sketches serialized from C++ or Python can be interpreted by compatible Java
sketches and visa versa.
+
+* All sketches have a serialized form which is able to be deserialized by any
version of the library since the sketch was introduced.
Review Comment:
“Prediction is very difficult, especially if it's about the future!” --
Niels Bohr.
I don't know of any software that guarantees forward compatibility forever.
Which means old code can always read structures created by future code. Even
international standards bodies don't guarantee that. The Java language doesn't
guarantee that with its class version IDs. Non-compatible changes can occur for
lots of reasons including changes required for security reasons, obsolescence
of language features, or new capabilities that were not imagined when the
original code was created.
We recognize the challenge in large system environments, with different
languages, different platforms all potentially using different versions of the
software. And we are trying our best to provide capabilities to at least allow
these large environments to be able to interchange serialized sketches across
languages and platforms efficiently. We are not aware of any other open-source
sketch library that even provides this capability. Cross version compatibility
of software is a challenge that all platforms face in general. It is up to the
platform maintainers to keep their software up-to-date, and this not new and
not different here.
Nonetheless, to put your mind somewhat at ease, realize that we have two
levels of versioning in our library (this is true across all of our languages):
- **Software Version**: this is the release version, published via Apache
and specified in the POM file or equivalent, this can change relatively
frequently based on bug fixes and introduction of new capabilities. Here, we
try very hard to obey the principles of Semantic Versioning as specified by
[semver.org](https://semver.org).
- **Serialization Version**: (SerVer) This is a small integer placed in the
preamble of the serialized byte array that indicates the version of the
serialized structure for the sketch. A single SerVer may represent multiple
structures all based on the same sketch when stored in different states, e.g.,
Single Item, Compact, Updatable, etc). This SerVer changes VERY rarely, if at
all. Of all of our sketches, only 3, (Theta, KLL and Sampling) have more than
one SerVer. There are and will be many Software Versions of the same sketch
that still use the same SerVer. When we are forced (rarely) to update the
SerVer, we provide the capability in the Software Version of the code
associated with the new SerVer the ability to read and convert the old SerVer
to the new SerVer. This is why our newest Software Versions can still read and
interpret older SerVer serialized sketches that go back to when our project was
started at Yahoo (2012), and before we went open-source (2015).
This means that as long as the SerVer is the same, older Software Versions
should be able to read sketch images created by newer software versions. But
the APIs may be different, obviously. An older SW version will not be able to
take advantage of new features introduced in new SW versions, but it should be
able to do what it did before. In other words, there will be no loss of access
to the serialized sketch and its older SW version capabilities. As a user, you
don't need to worry about or be able to access the SerVer. If a sketch is
presented with a new SerVer that it is not compatible with, the sketch should
throw an exception and say what the problem is, just like Java does.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]