This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 5efd6b11 Automatic Site Publish by Buildbot
5efd6b11 is described below
commit 5efd6b11a4842da9e0d119dfd3be52a5ab7d3066
Author: buildbot <[email protected]>
AuthorDate: Fri Sep 30 14:54:42 2022 +0000
Automatic Site Publish by Buildbot
---
output/docs/Community/Research.html | 1 +
output/docs/KLL/KLLAccuracyAndSize.html | 2 +-
output/docs/Quantiles/QuantilesOverview.html | 47 +++++++++++++++++-----
output/docs/Quantiles/QuantilesReferences.html | 1 +
output/docs/Quantiles/QuantilesSketchOverview.html | 10 ++---
.../SketchingQuantilesAndRanksTutorial.html | 30 ++++++++------
6 files changed, 60 insertions(+), 31 deletions(-)
diff --git a/output/docs/Community/Research.html
b/output/docs/Community/Research.html
index 0661a401..dbef74a9 100644
--- a/output/docs/Community/Research.html
+++ b/output/docs/Community/Research.html
@@ -636,6 +636,7 @@ All algorithms in the library produce mergeable summaries,
and come with formal
<p><strong>[VSGB05]</strong> Shobha Venkataraman, Dawn Xiaodong Song, Phillip
B. Gibbons, and Avrim Blum. New streaming algorithms for fast detection of
superspreaders. In <em>Internet Society NDSS Proceedings</em>, 2005.</p>
+<p><strong>[CKLTV]</strong> Graham Cormode, Zohar Karnin, Edo Liberty, Justin
Thaler, Pavel Veselý. Relative Error Streaming Quantiles. In <em>PODS ‘21
Proceedings</em>, 2021.</p>
</div> <!-- End content -->
</div> <!-- End row -->
diff --git a/output/docs/KLL/KLLAccuracyAndSize.html
b/output/docs/KLL/KLLAccuracyAndSize.html
index 0bd07a3d..b718455f 100644
--- a/output/docs/KLL/KLLAccuracyAndSize.html
+++ b/output/docs/KLL/KLLAccuracyAndSize.html
@@ -513,7 +513,7 @@
<p>The accuracy of a quantile sketch is a function of the configured value
<i>K</i>, which also affects
the overall size of the sketch (default K = 200).</p>
-<p>Accuracy for quantiles sketches is specified and measured with respect to
the <em>rank</em> only, not the values.</p>
+<p>The accuracy quantiles sketches is specified and measured with respect to
the <em>rank</em> only, not the values.</p>
<p>The KLL Sketch has <em>absolute error</em>. For example, a specified
accuracy of 1% at the median (rank = 0.50) means that the true value (if you
could extract it from the set) should be
between <em>getQuantile(0.49)</em> and <em>getQuantile(0.51)</em>. This same
1% error applied at a rank of 0.95 means that the true value should be between
<em>getQuantile(0.94)</em> and <em>getQuantile(0.96)</em>. In other words, the
error is a fixed +/- epsilon for the entire range of rank values.</p>
diff --git a/output/docs/Quantiles/QuantilesOverview.html
b/output/docs/Quantiles/QuantilesOverview.html
index 12b30c19..3524a45f 100644
--- a/output/docs/Quantiles/QuantilesOverview.html
+++ b/output/docs/Quantiles/QuantilesOverview.html
@@ -512,7 +512,13 @@
<h2 id="quantile-type-sketches-in-the-library">Quantile-type sketches in the
library</h2>
-<p>There are three types of quantiles sketches in the library. These sketches
have many parallel methods. Please refer to the individual Javadocs for more
information.</p>
+<p>This is an overview of the three types of quantiles sketches in the
library. Each of these quantile types may have one or more specific
implementtaions.</p>
+
+<p>The mathematical error bounds of all the quantile sketches is specified
with respect to rank and not with respect to quantile values. In other words,
the difference between the rank upper bound and the rank lower bound is the
confidence interval and can be expressed as a percent of the overall rank
distribution (which is 1.0) and is the mathematically derived error for a
specific configuration of the sketch.</p>
+
+<p>Although the quantile upper bound and quantile lower bounds can be
approximately computed from the rank upper bound and rank lower bound, and the
difference between the quantile bounds is also an approximate confidence
interval, the size of the quantile confidence interval may not be meaningful
and is not constrained by the defined error of the sketch.</p>
+
+<p>These sketches have many parallel methods. Please refer to the individual
Javadocs for more information.</p>
<h3 id="the-req-sketch">The REQ Sketch</h3>
@@ -531,16 +537,16 @@
<li>Directory: req</li>
</ul>
</li>
- <li>Key Features
+ <li>Key Features (both Java & C++)
<ul>
<li>Accuracy %: a function of <em>K</em> and <strong>relative</strong>
with respect to rank. The user can select either High Rank Accuracy (HRA) or
Low Rank Accuracy (LRA). This enables extremely high accuracy for the ends of
the rank domain. E.g., 99.999%ile quantiles.</li>
- <li>User selectable comparison criteria:
+ <li>User selectable comparison QuantileSearchCriteria:
<ul>
- <li>Less-Than (LT), which is compatible with the KLL and older
Quantiles Sketch</li>
- <li>Less-Than-or-Equals (LE), a common definition in some of the
theoretical literature.</li>
+ <li>Exclusive, which is compatible with the KLL and older Quantiles
Sketch</li>
+ <li>Inclusive, a common definition in some of the theoretical
literature.</li>
</ul>
</li>
- <li>Java: Dedicated <em>float</em> implementation</li>
+ <li>Java: Dedicated <em>float</em> implementation.</li>
<li>C++: Template implementation for arbitrary comparible types.</li>
<li>Python: Dedicated <em>float</em> and <em>integer</em>
implementations.</li>
</ul>
@@ -564,11 +570,17 @@
<li>Directory: kll</li>
</ul>
</li>
- <li>Key Features
+ <li>Key Features (both Java & C++)
<ul>
- <li>Accuracy %: a function of <em>K</em> and independent of rank.*</li>
+ <li>User selectable comparison QuantileSearchCriteria:
+ <ul>
+ <li>Exclusive, which is compatible with the KLL and older Quantiles
Sketch</li>
+ <li>Inclusive, a common definition in some of the theoretical
literature.</li>
+ </ul>
+ </li>
+ <li>Accuracy %: a function of <em>K</em> and independent of rank.</li>
<li>Near optimal accuracy per sketch size compared to other constant
accuracy quantiles sketches.</li>
- <li>Java: Dedicated <em>float</em> implementation</li>
+ <li>Java: Dedicated <em>float</em> and <em>double</em>
implementations.</li>
<li>C++: Template implementation for arbitrary comparible types.</li>
<li>Python: Dedicated <em>float</em> and <em>integer</em>
implementations</li>
</ul>
@@ -585,10 +597,23 @@
<li>Package: org.apache.datasketches.quantiles</li>
</ul>
</li>
- <li>Key Features
+ <li>C++, Python
+ <ul>
+ <li>Release 1.0.0, 17 Sep 2019</li>
+ <li>Repo: <a
href="https://github.com/apache/datasketches-cpp">https://github.com/apache/datasketches-cpp</a></li>
+ <li>Directory:</li>
+ </ul>
+ </li>
+ <li>Key Features (both Java & C++)
<ul>
+ <li>User selectable comparison QuantileSearchCriteria:
+ <ul>
+ <li>Exclusive, which is compatible with the KLL and older Quantiles
Sketch</li>
+ <li>Inclusive, a common definition in some of the theoretical
literature.</li>
+ </ul>
+ </li>
<li>Accuracy %: a function of <em>K</em> and independent of rank.</li>
- <li>Dedicated <em>double</em> implentation</li>
+ <li>Dedicated <em>double</em> implentation.</li>
<li>java generic implementation for arbitrary comparible types.</li>
<li>The dedicated <em>double</em> implementation can be configured for
off-heap operation.</li>
</ul>
diff --git a/output/docs/Quantiles/QuantilesReferences.html
b/output/docs/Quantiles/QuantilesReferences.html
index baf37cc9..1787e3c5 100644
--- a/output/docs/Quantiles/QuantilesReferences.html
+++ b/output/docs/Quantiles/QuantilesReferences.html
@@ -520,6 +520,7 @@
<li>In J. Gehrke M. Garofalakis and R. Rastogi, editors, In Data Stream
Management: Processing High-Speed Data Streams. Springer, 2016.</li>
<li>David Felber and Rafail Ostrovsky. A randomized online quantile summary
in O((1/ε) log(1/ε))</li>
<li>Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý.
Relative Error Streaming Quantiles. https://arxiv.org/abs/2004.01668.</li>
+ <li>Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý.
Relative Error Streaming Quantiles. In <em>PODS ‘21 Proceedings</em>, 2021.</li>
</ul>
</div> <!-- End content -->
diff --git a/output/docs/Quantiles/QuantilesSketchOverview.html
b/output/docs/Quantiles/QuantilesSketchOverview.html
index 2c0b7253..f2339189 100644
--- a/output/docs/Quantiles/QuantilesSketchOverview.html
+++ b/output/docs/Quantiles/QuantilesSketchOverview.html
@@ -514,8 +514,8 @@
<p>This is a stochastic streaming sketch that enables near-real time analysis
of the
approximate distribution of comparable values from a very large stream in a
single pass.
-The analysis is obtained using a getQuantiles() function or its inverse
functions the
-Probability Mass Function from getPMF() and the Cumulative Distribution
Function from getCDF().</p>
+The analysis is obtained using a getQuantiles() function or its inverse
functions, the
+Probability Mass Function, getPMF(), and the Cumulative Distribution Function,
getCDF().</p>
<ul>
<li><strong>NOTE:</strong> See also the <a
href="/docs/KLL/KLLSketch.html">KLL Sketch</a>.</li>
@@ -584,8 +584,7 @@ way off.</p>
<h3 id="more-code-snippets">More Code Snippets</h3>
-<p>Code examples are best gleaned from the test code that exercises all the
various capabilities of the
-sketch. Here are some brief snippets, simpler than the above graphs, to get
you started.</p>
+<p>Code examples are best gleaned from the test code that exercises all the
various capabilities of the sketch. Here are some brief snippets, simpler than
the above graphs, to get you started.</p>
<h4 id="median-and-top-quartile">Median and Top Quartile</h4>
@@ -745,8 +744,7 @@ MyItem[] itemSplitPoints =
sketch.getQuantiles(rankFractions);
<p>The quantiles algorithm is an implementation of the Low Discrepancy
Mergeable Quantiles Sketch, using double values, described in section 3.2 of
the journal version of the paper “Mergeable Summaries” by Agarwal, Cormode,
Huang, Phillips, Wei, and Yi.
<a href="http://dblp.org/rec/html/journals/tods/AgarwalCHPWY13"></a> <!-- does
not work with https --></p>
-<p>This algorithm is independent of the distribution of values, which can be
anywhere in the
-range of the IEEE-754 64-bit doubles.</p>
+<p>This algorithm is independent of the distribution of values, which can be
anywhere in the range of the IEEE-754 64-bit doubles.</p>
<p>This algorithm intentionally inserts randomness into the sampling process
for values that
ultimately get retained in the sketch. The result is that this algorithm is
not
diff --git a/output/docs/Quantiles/SketchingQuantilesAndRanksTutorial.html
b/output/docs/Quantiles/SketchingQuantilesAndRanksTutorial.html
index 0d45b6fa..ca43bf97 100644
--- a/output/docs/Quantiles/SketchingQuantilesAndRanksTutorial.html
+++ b/output/docs/Quantiles/SketchingQuantilesAndRanksTutorial.html
@@ -525,7 +525,7 @@ of quantiles, ranks and their functions.</p>
<ul>
<li>
- <p>The <strong>natural rank</strong> is a <strong>natural number</strong>
from the set of one-based, natural numbers, ℕ<sub>1</sub>, and is derived by
enumerating an ordered set of values, starting with the value 1, up to
<em>n</em>, the number of values in the set.</p>
+ <p>The <strong>natural rank</strong> is a <strong>natural number</strong>
from the set of one-based, natural numbers, ℕ<sub>1</sub>, and is derived by
enumerating an ordered set of values, starting with the value 1, up to
<em>n</em>, the number of values in the original set.</p>
</li>
<li>The <strong><em>normalized rank</em></strong> is a number between 0.0
and 1.0 computed by dividing the <em>natural rank</em> by the total number of
values in the set, <em>n</em>. Thus, for finite sets, any <em>normalized
rank</em> is in the range (0, 1]. Normalized ranks are often written as a
percent. But don’t confuse percent with percentile! This will be explained
below.</li>
<li>A rank of 0, whether natural or normalized, represents the empty
set.</li>
@@ -585,9 +585,12 @@ To wit:</p>
</tbody>
</table>
+<h3 id="note">Note:</h3>
+<p>The term “value” can be ambiguous because items that we stream into a
sketch are values and numeric ranks are also values. To avoid this ambiguity,
we will use the term “quantiles” to refer to values that are streamed into a
sketch even before they have been associated with a rank.</p>
+
<p>Let’s define the simple functions</p>
-<h3
id="quantilerank-or-qr--return-the-quantile-value-q-associated-with-a-given-rank-r"><strong><em>quantile(rank)</em></strong>
or <strong><em>q(r)</em></strong> := return the quantile value
<strong><em>q</em></strong> associated with a given <strong><em>rank,
r</em></strong>.</h3>
+<h3
id="quantilerank-or-qr--return-the-quantile-q-associated-with-a-given-rank-r"><strong><em>quantile(rank)</em></strong>
or <strong><em>q(r)</em></strong> := return the quantile
<strong><em>q</em></strong> associated with a given <strong><em>rank,
r</em></strong>.</h3>
<h3
id="rankquantile-or-rq--return-the-rank-r-associated-with-a-given-quantile-q"><strong><em>rank(quantile)</em></strong>
or <strong><em>r(q)</em></strong> := return the rank
<strong><em>r</em></strong> associated with a given <strong><em>quantile,
q</em></strong>.</h3>
@@ -619,7 +622,7 @@ To wit:</p>
<p>And this is certainly true of the table above.</p>
<h2 id="the-challenge-of-duplicates">The challenge of duplicates</h2>
-<p>With real data we often encounter duplicate values in the stream. Let’s
examine this next table.</p>
+<p>With real data we often encounter duplicate quantiles in the stream. Let’s
examine this next table.</p>
<table>
<thead>
@@ -644,11 +647,10 @@ To wit:</p>
</tbody>
</table>
-<p>As you can see <em>q(r)</em> is straightforward. But how about
<em>r(q)</em>? Which of the rank values 2, 3, or 4 should the function return
given the value 20? Given this data, and our definitions so far,
-the function <em>r(q)</em> is ambiguous. We will see how to resolve this
shortly.</p>
+<p>As you can see <em>q(r)</em> is straightforward. But how about
<em>r(q)</em>? Which of the ranks 2, 3, or 4 should the function return, given
the quantile 20? Given this data, and our definitions so far, the function
<em>r(q)</em> is ambiguous. We will see how to resolve this shortly.</p>
<h2 id="the-challenge-of-approximation">The challenge of approximation</h2>
-<p>By definition, sketching algorithms are approximate, and they achieve their
high performance by discarding data. Suppose you feed <em>n</em> items into a
sketch that retains only <em>m < n</em> items. This means <em>n-m</em>
values were discarded. The sketch must track the value <em>n</em> used for
computing the rank and quantile functions. When the sketch reconstructs the
relationship between ranks and values <em>n-m</em> rank values are missing
creating holes in the sequence o [...]
+<p>By definition, sketching algorithms are approximate, and they achieve their
high performance by discarding data. Suppose you feed <em>n</em> quantiles
into a sketch that retains only <em>m < n</em> quantiles. This means
<em>n-m</em> quantiles were discarded. The sketch must track the quantity
<em>n</em> used for computing the rank and quantile functions. When the sketch
reconstructs the relationship between ranks and quantiles, <em>n-m</em>
quantiles are missing creating holes in [...]
<p>The raw data might look like this, with its associated natural ranks.</p>
@@ -685,7 +687,7 @@ the function <em>r(q)</em> is ambiguous. We will see how to
resolve this shortly
</tbody>
</table>
-<p>The sketch might discard the even values producing something like this:</p>
+<p>The sketch might discard the even numbered quantiles producing something
like this:</p>
<table>
<thead>
@@ -710,12 +712,12 @@ the function <em>r(q)</em> is ambiguous. We will see how
to resolve this shortly
</tbody>
</table>
-<p>When the sketch deletes values it adjusts the associated ranks by
effectively increasing the “weight” of adjacent items so that they are
positionally approximately correct and the top rank corresponds to
<em>n</em>.</p>
+<p>When the sketch deletes quantiles it adjusts the associated ranks by
effectively increasing the “weight” of adjacent quantiles so that they are
approximately positionally correct and the top natural rank corresponds to
<em>n</em>.</p>
<p>How do we resolve <em>q(3)</em> or <em>r(20)</em>?</p>
<h2 id="the-need-for-inequality-search">The need for inequality search</h2>
-<p>The quantile sketch algorithms discussed in the literature primarily differ
by how they choose which values in the stream should be discarded. After the
elimination process, all of the quantiles sketch implementations are left with
the challenge of how to reconstruct the actual distribution, approximately and
with good accuracy.</p>
+<p>The quantile sketch algorithms discussed in the literature primarily differ
by how they choose which quantiles in the stream should be discarded. After the
elimination process, all of the quantiles sketch implementations are left with
the challenge of how to reconstruct the actual distribution, approximately and
with good accuracy.</p>
<p>Given the presence of duplicates and absence of values from the stream we
must redefine the above quantile and rank functions as inequalities
<strong>while retaining the properties of 1:1 functions.</strong></p>
@@ -727,7 +729,7 @@ the function <em>r(q)</em> is ambiguous. We will see how to
resolve this shortly
<ul>
<li>
- <p><strong>Rule 1:</strong> Every Quantile that exists in the input stream
or retained by the sketch has an associated Rank.</p>
+ <p><strong>Rule 1:</strong> Every quantile that exists in the input stream
or retained by the sketch has an associated rank.</p>
</li>
<li>
<p><strong>Rule 2:</strong> All of our quantile sketches only retain
quantiles that exist in the actual input stream of quantiles.</p>
@@ -742,8 +744,8 @@ the function <em>r(q)</em> is ambiguous. We will see how to
resolve this shortly
<p><strong>Rule 5:</strong> All of our quantile algorithms compensate for
quantiles removed during the sketch quantile selection and compression process
by increasing the weights of some of the quantiles not selected for removal,
such that:</p>
<ul>
- <li>The sum of the natural weights of all quantiles retained by the
sketch equals <strong>N</strong>, the total count of all quantiles given to the
sketch.</li>
- <li>And by corollary, the largest quantile, when sorted by cumulative
rank, has a cumulative natural rank of <strong>N</strong>, or equivalently, a
cumulative normalized rank of <strong>1.0</strong>.</li>
+ <li>The sum of the natural weights of all quantiles retained by the
sketch equals <strong>n</strong>, the total count of all quantiles given to the
sketch.</li>
+ <li>And by corollary, the largest quantile, when sorted by cumulative
rank, has a cumulative natural rank of <strong>n</strong>, or equivalently, a
cumulative normalized rank of <strong>1.0</strong>.</li>
</ul>
</li>
</ul>
@@ -1874,6 +1876,8 @@ the function <em>r(q)</em> is ambiguous. We will see how
to resolve this shortly
<h3
id="quantilerank-exclusive_strict-or-qr-gt_strict-given-r-return-the-quantile-q-of-the-smallest-rank-that-is-strictly-greater-than-r"><strong><em>quantile(rank,
EXCLUSIVE_STRICT)</em></strong> or <strong><em>q(r, GT_STRICT)</em></strong>
:=<br />Given <em>r</em>, return the quantile, <em>q</em>, of the smallest rank
that is strictly Greater Than <em>r</em>.</h3>
+<h3
id="note-this-rule-is-marginal-in-its-usefulness-so-it-is-not-currently-implemented">Note:
This rule is marginal in its usefulness so it is not currently
implemented.</h3>
+
<p><b>Implementation:</b></p>
<ul>
@@ -2210,7 +2214,7 @@ the function <em>r(q)</em> is ambiguous. We will see how
to resolve this shortly
</tbody>
</table>
-<h2
id="these-inequality-functions-maintain-the-11-functional-relationship">These
inequality functions maintain the 1:1 functional relationship</h2>
+<h2
id="these-inequality-functions-maintain-the-11-functional-relationship-approximately">These
inequality functions maintain the 1:1 functional relationship,
approximately.</h2>
<h3
id="the-exclusive-search-for-qr-is-the-inverse-of-the-exclusive-search-for-rq">The
<em>exclusive</em> search for q(r) is the inverse of the <em>exclusive</em>
search for r(q).</h3>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]