This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 1608270b Automatic Site Publish by Buildbot
1608270b is described below
commit 1608270b1358d6c05d482f21933e32cd3492a83b
Author: buildbot <[email protected]>
AuthorDate: Sat Mar 2 23:01:57 2024 +0000
Automatic Site Publish by Buildbot
---
output/docs/Architecture/LargeScale.html | 55 ++++++++++++++++++++++++++------
1 file changed, 46 insertions(+), 9 deletions(-)
diff --git a/output/docs/Architecture/LargeScale.html
b/output/docs/Architecture/LargeScale.html
index 9689ea0a..97a8be7c 100644
--- a/output/docs/Architecture/LargeScale.html
+++ b/output/docs/Architecture/LargeScale.html
@@ -514,12 +514,52 @@
-->
<h2 id="designed-for-large-scale-computing-systems">Designed for Large-scale
Computing Systems</h2>
+<h4 id="multiple-languages">Multiple Languages</h4>
+
+<ul>
+ <li>The DataSketches library is now available in three languages, Java, C++,
and Python. A fourth language, Go, is in development.</li>
+</ul>
+
+<h3
id="compatibility-across-languages-software-versions-and-binary-serialization-versions">Compatibility
Across Languages, Software Versions And Binary Serialization Versions</h3>
+<p>Large-scale computing environments may have a mix of various platforms
utilizing different programming languages each with the possiblity of using
different Software Versions of our DataSketches library. Cross version
compatibility of software is a challenge that all platforms face in general,
and it is up to the platform maintainers to keep their software up-to-date.
This not new and not different with the DataSketches library.</p>
+
+<p>Nonetheless, it our goal to strive to make it as easy as practically
possible to serialize our sketches in one of our supported languages on one
platform and to be deserialized in a different supported language, potentially
on a different, even remote platform, and perhaps much later in time.</p>
+
+<p>With this goal in mind, here are some of the key strategic decisions we
have made in the development of the DataSketches library.</p>
+
+<h4 id="two-levels-of-versioning">Two levels of versioning.</h4>
+
+<ul>
+ <li>
+ <p><strong>Software Version:</strong> This is the release version,
published via Apache.org and specified in the POM file or equivalent. This can
change relatively frequently based on bug fixes and introduction of new
capabilities. We follow the principles of <em>Semantic Versioning</em> as
specified by <a href="https://semver.org">semver.org</a>.</p>
+ </li>
+ <li>
+ <p><strong>Serialization Version:</strong> (<em>SerVer</em>) This is a
small integer placed in the preamble of the serialized byte array that
indicates the version of the serialized structure for the sketch. This is very
similar to Java’s <a
href="https://en.wikipedia.org/wiki/Java_class_file"><em>Class File Format
Version</em></a>. A single <em>SerVer</em> may represent multiple structures
all based on the same sketch when stored in different states, e.g., <em>Single
Item</em>, <em> [...]
+ </li>
+</ul>
+
+<p>From the user’s perspective, as long as the <em>SerVer</em> is the same,
older <em>Software Versions</em> should be able to read sketch images created
by newer <em>Software Versions</em>. But the APIs may be different, obviously.
An older <em>Software Version</em> will not be able to take advantage of new
features introduced in new <em>Software Versions</em>, but it should be able to
do what it did before. In other words, there will be no loss of access to the
serialized sketch and th [...]
+
+<p>Sketches requiring user-written custom serialize/deserialize code rely on
users to port that custom code themselves for cross-version, cross-language,
and cross-platform compatibility.</p>
+
+<h4 id="the-serialized-image-of-a-sketch">The Serialized Image of a Sketch</h4>
+<ul>
+ <li>The structure (or image) of a serialized sketch is independent of the
language from which it was created.</li>
+ <li>The sketch image only contains little-endian primitives, such as int64,
int32, int16, int8, double-64, float-32, UTF-8 strings, and simple array
structures of those. While these serialized primitives between languages may
not be strictly equal they can be interpreted to be logically equivalent. We do
not support big-endian serialization.</li>
+ <li>The sketch image is unique for each type of sketch.</li>
+ <li>Simply speaking, a sketch image can be viewed as a blob of bytes, which
is easily stored and easily transported using many different protocols,
including Protobuf, Avro, Thrift, Byte64, etc.</li>
+</ul>
+
+<p>As a result, sketches serialized in one supported language can be
interpreted by a different supported language, with the caveat that due to
language differences, availability of resources, and time to develop, not all
sketches may be available in all languages at the same time.</p>
+
<h3 id="easy-integration-with-minimal-dependencies">Easy Integration with
Minimal Dependencies</h3>
+<p>We strive to make our sketch library easy to integrate into larger systems
by keeping the number of external dependencies at a minimum.</p>
+
<ul>
<li><a
href="https://datasketches.apache.org/docs/Community/Downloads.html">Java
Core</a>
<ul>
<li>The Java core library (including Memory) has no dependencies outside
of the Java JVM at runtime allowing simple integration into virtually any Java
based system environment.</li>
- <li>All of the Java components are Maven Deployable and registered with
<a
href="https://search.maven.org/classic/#search%7Cga%7C1%7Cg%3A%22org.apache.datasketches%22">The
Central Repository</a></li>
+ <li>All of the Java components and artifacts are Maven Deployable and
registered with <a
href="https://search.maven.org/classic/#search%7Cga%7C1%7Cg%3A%22org.apache.datasketches%22">The
Central Repository</a></li>
</ul>
</li>
<li><a
href="https://datasketches.apache.org/docs/Community/Downloads.html">C++
Core</a>
@@ -534,11 +574,6 @@
</li>
</ul>
-<h3 id="cross-language-binary-compatibility">Cross Language Binary
Compatibility</h3>
-<ul>
- <li>Sketches serialized from C++ or Python can be interpreted by compatible
Java sketches and visa versa.</li>
-</ul>
-
<h3 id="speed">Speed</h3>
<ul>
<li>
@@ -555,7 +590,9 @@
</li>
</ul>
-<h3 id="systems-integrations">Systems Integrations</h3>
+<h3 id="system-integrations">System Integrations</h3>
+<p>The following are system integrations that we have been involved with, but
there are many more platform integrations out there that were performed by the
individual platform teams.</p>
+
<ul>
<li>
<p><a
href="https://datasketches.apache.org/docs/SystemIntegrations/ApacheDruidIntegration.html">Druid
Integration</a></p>
@@ -592,7 +629,7 @@ The Java sketches utilize this powerful component.</p>
<li>
<p>Built-in <b>Upper-Bound and Lower-Bound estimators</b>.
You are never in the dark about how good of an estimate the sketch is
providing.
-All the sketches are able to estimate the upper and lower bounds of the
estimate given a
+Nearly all the sketches are able to estimate the upper and lower bounds of the
estimate given a
confidence level.</p>
</li>
<li>
@@ -600,7 +637,7 @@ confidence level.</p>
tuning options.</p>
</li>
<li>
- <p><b>Small Footprint Per Sketch</b>. The operating and storage footprint
for both
+ <p><b>Small Footprint Per Sketch</b>. The in-memory run-time and storage
footprint for both
row and column oriented storage are minimized with compact binary
representations, which are much smaller
than the raw input stream and with a well defined upper bound of size.</p>
</li>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]