[datasketches-website] branch asf-site updated: Automatic Site Publish by Buildbot

git-site-role Fri, 30 Sep 2022 07:56:33 -0700

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 5efd6b11 Automatic Site Publish by Buildbot
5efd6b11 is described below

commit 5efd6b11a4842da9e0d119dfd3be52a5ab7d3066
Author: buildbot <[email protected]>
AuthorDate: Fri Sep 30 14:54:42 2022 +0000

    Automatic Site Publish by Buildbot
---
 output/docs/Community/Research.html                |  1 +
 output/docs/KLL/KLLAccuracyAndSize.html            |  2 +-
 output/docs/Quantiles/QuantilesOverview.html       | 47 +++++++++++++++++-----
 output/docs/Quantiles/QuantilesReferences.html     |  1 +
 output/docs/Quantiles/QuantilesSketchOverview.html | 10 ++---
 .../SketchingQuantilesAndRanksTutorial.html        | 30 ++++++++------
 6 files changed, 60 insertions(+), 31 deletions(-)

diff --git a/output/docs/Community/Research.html 
b/output/docs/Community/Research.html
index 0661a401..dbef74a9 100644
--- a/output/docs/Community/Research.html
+++ b/output/docs/Community/Research.html
@@ -636,6 +636,7 @@ All algorithms in the library produce mergeable summaries, 
and come with formal
 
 <p><strong>[VSGB05]</strong> Shobha Venkataraman, Dawn Xiaodong Song, Phillip 
B. Gibbons, and Avrim Blum. New streaming algorithms for fast detection of 
superspreaders. In <em>Internet Society NDSS Proceedings</em>, 2005.</p>
 
+<p><strong>[CKLTV]</strong> Graham Cormode, Zohar Karnin, Edo Liberty, Justin 
Thaler, Pavel Veselý. Relative Error Streaming Quantiles. In <em>PODS ‘21 
Proceedings</em>, 2021.</p>
 
       </div> <!-- End content -->
     </div> <!-- End row -->
diff --git a/output/docs/KLL/KLLAccuracyAndSize.html 
b/output/docs/KLL/KLLAccuracyAndSize.html
index 0bd07a3d..b718455f 100644
--- a/output/docs/KLL/KLLAccuracyAndSize.html
+++ b/output/docs/KLL/KLLAccuracyAndSize.html
@@ -513,7 +513,7 @@
 <p>The accuracy of a quantile sketch is a function of the configured value 
<i>K</i>, which also affects
 the overall size of the sketch (default K = 200).</p>
 
-<p>Accuracy for quantiles sketches is specified and measured with respect to 
the <em>rank</em> only, not the values.</p>
+<p>The accuracy quantiles sketches is specified and measured with respect to 
the <em>rank</em> only, not the values.</p>
 
 <p>The KLL Sketch has <em>absolute error</em>. For example, a specified 
accuracy of 1% at the median (rank = 0.50) means that the true value (if you 
could extract it from the set) should be 
 between <em>getQuantile(0.49)</em> and <em>getQuantile(0.51)</em>. This same 
1% error applied at a rank of 0.95 means that the true value should be between 
<em>getQuantile(0.94)</em> and <em>getQuantile(0.96)</em>. In other words, the 
error is a fixed +/- epsilon for the entire range of rank values.</p>
diff --git a/output/docs/Quantiles/QuantilesOverview.html 
b/output/docs/Quantiles/QuantilesOverview.html
index 12b30c19..3524a45f 100644
--- a/output/docs/Quantiles/QuantilesOverview.html
+++ b/output/docs/Quantiles/QuantilesOverview.html
@@ -512,7 +512,13 @@
 
 <h2 id="quantile-type-sketches-in-the-library">Quantile-type sketches in the 
library</h2>
 
-<p>There are three types of quantiles sketches in the library. These sketches 
have many parallel methods. Please refer to the individual Javadocs for more 
information.</p>
+<p>This is an overview of the three types of quantiles sketches in the 
library. Each of these quantile types may have one or more specific 
implementtaions.</p>
+
+<p>The mathematical error bounds of all the quantile sketches is specified 
with respect to rank and not with respect to quantile values.  In other words, 
the difference between the rank upper bound and the rank lower bound is the 
confidence interval and can be expressed as a percent of the overall rank 
distribution (which is 1.0) and is the mathematically derived error for a 
specific configuration of the sketch.</p>
+
+<p>Although the quantile upper bound and quantile lower bounds can be 
approximately computed from the rank upper bound and rank lower bound, and the 
difference between the quantile bounds is also an approximate confidence 
interval, the size of the quantile confidence interval may not be meaningful 
and is not constrained by the defined error of the sketch.</p>
+
+<p>These sketches have many parallel methods. Please refer to the individual 
Javadocs for more information.</p>
 
 <h3 id="the-req-sketch">The REQ Sketch</h3>
 
@@ -531,16 +537,16 @@
       <li>Directory: req</li>
     </ul>
   </li>
-  <li>Key Features
+  <li>Key Features (both Java &amp; C++)
     <ul>
       <li>Accuracy %: a function of <em>K</em> and <strong>relative</strong> 
with respect to rank. The user can select either High Rank Accuracy (HRA) or 
Low Rank Accuracy (LRA). This enables extremely high accuracy for the ends of 
the rank domain. E.g., 99.999%ile quantiles.</li>
-      <li>User selectable comparison criteria:
+      <li>User selectable comparison QuantileSearchCriteria:
         <ul>
-          <li>Less-Than (LT), which is compatible with the KLL and older 
Quantiles Sketch</li>
-          <li>Less-Than-or-Equals (LE), a common definition in some of the 
theoretical literature.</li>
+          <li>Exclusive, which is compatible with the KLL and older Quantiles 
Sketch</li>
+          <li>Inclusive, a common definition in some of the theoretical 
literature.</li>
         </ul>
       </li>
-      <li>Java: Dedicated <em>float</em> implementation</li>
+      <li>Java: Dedicated <em>float</em> implementation.</li>
       <li>C++: Template implementation for arbitrary comparible types.</li>
       <li>Python: Dedicated <em>float</em> and <em>integer</em> 
implementations.</li>
     </ul>
@@ -564,11 +570,17 @@
       <li>Directory: kll</li>
     </ul>
   </li>
-  <li>Key Features
+  <li>Key Features (both Java &amp; C++)
     <ul>
-      <li>Accuracy %: a function of <em>K</em> and independent of rank.*</li>
+      <li>User selectable comparison QuantileSearchCriteria:
+        <ul>
+          <li>Exclusive, which is compatible with the KLL and older Quantiles 
Sketch</li>
+          <li>Inclusive, a common definition in some of the theoretical 
literature.</li>
+        </ul>
+      </li>
+      <li>Accuracy %: a function of <em>K</em> and independent of rank.</li>
       <li>Near optimal accuracy per sketch size compared to other constant 
accuracy quantiles sketches.</li>
-      <li>Java: Dedicated <em>float</em> implementation</li>
+      <li>Java: Dedicated <em>float</em> and <em>double</em> 
implementations.</li>
       <li>C++: Template implementation for arbitrary comparible types.</li>
       <li>Python: Dedicated <em>float</em> and <em>integer</em> 
implementations</li>
     </ul>
@@ -585,10 +597,23 @@
       <li>Package: org.apache.datasketches.quantiles</li>
     </ul>
   </li>
-  <li>Key Features
+  <li>C++, Python
+    <ul>
+      <li>Release 1.0.0, 17 Sep 2019</li>
+      <li>Repo: <a 
href="https://github.com/apache/datasketches-cpp";>https://github.com/apache/datasketches-cpp</a></li>
+      <li>Directory:</li>
+    </ul>
+  </li>
+  <li>Key Features (both Java &amp; C++)
     <ul>
+      <li>User selectable comparison QuantileSearchCriteria:
+        <ul>
+          <li>Exclusive, which is compatible with the KLL and older Quantiles 
Sketch</li>
+          <li>Inclusive, a common definition in some of the theoretical 
literature.</li>
+        </ul>
+      </li>
       <li>Accuracy %: a function of <em>K</em> and independent of rank.</li>
-      <li>Dedicated <em>double</em> implentation</li>
+      <li>Dedicated <em>double</em> implentation.</li>
       <li>java generic implementation for arbitrary comparible types.</li>
       <li>The dedicated <em>double</em> implementation can be configured for 
off-heap operation.</li>
     </ul>
diff --git a/output/docs/Quantiles/QuantilesReferences.html 
b/output/docs/Quantiles/QuantilesReferences.html
index baf37cc9..1787e3c5 100644
--- a/output/docs/Quantiles/QuantilesReferences.html
+++ b/output/docs/Quantiles/QuantilesReferences.html
@@ -520,6 +520,7 @@
   <li>In J. Gehrke M. Garofalakis and R. Rastogi, editors, In Data Stream 
Management: Processing High-Speed Data Streams. Springer, 2016.</li>
   <li>David Felber and Rafail Ostrovsky. A randomized online quantile summary 
in O((1/ε) log(1/ε))</li>
   <li>Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý. 
Relative Error Streaming Quantiles. https://arxiv.org/abs/2004.01668.</li>
+  <li>Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý. 
Relative Error Streaming Quantiles. In <em>PODS ‘21 Proceedings</em>, 2021.</li>
 </ul>
 
       </div> <!-- End content -->
diff --git a/output/docs/Quantiles/QuantilesSketchOverview.html 
b/output/docs/Quantiles/QuantilesSketchOverview.html
index 2c0b7253..f2339189 100644
--- a/output/docs/Quantiles/QuantilesSketchOverview.html
+++ b/output/docs/Quantiles/QuantilesSketchOverview.html
@@ -514,8 +514,8 @@
 
 <p>This is a stochastic streaming sketch that enables near-real time analysis 
of the 
 approximate distribution of comparable values from a very large stream in a 
single pass. 
-The analysis is obtained using a getQuantiles() function or its inverse 
functions the 
-Probability Mass Function from getPMF() and the Cumulative Distribution 
Function from getCDF().</p>
+The analysis is obtained using a getQuantiles() function or its inverse 
functions, the 
+Probability Mass Function, getPMF(), and the Cumulative Distribution Function, 
getCDF().</p>
 
 <ul>
   <li><strong>NOTE:</strong> See also the <a 
href="/docs/KLL/KLLSketch.html">KLL Sketch</a>.</li>
@@ -584,8 +584,7 @@ way off.</p>
 
 <h3 id="more-code-snippets">More Code Snippets</h3>
 
-<p>Code examples are best gleaned from the test code that exercises all the 
various capabilities of the
-sketch.  Here are some brief snippets, simpler than the above graphs, to get 
you started.</p>
+<p>Code examples are best gleaned from the test code that exercises all the 
various capabilities of the sketch.  Here are some brief snippets, simpler than 
the above graphs, to get you started.</p>
 
 <h4 id="median-and-top-quartile">Median and Top Quartile</h4>
 
@@ -745,8 +744,7 @@ MyItem[] itemSplitPoints = 
sketch.getQuantiles(rankFractions);
 <p>The quantiles algorithm is an implementation of the Low Discrepancy 
Mergeable Quantiles Sketch, using double values, described in section 3.2 of 
the journal version of the paper “Mergeable Summaries” by Agarwal, Cormode, 
Huang, Phillips, Wei, and Yi. 
 <a href="http://dblp.org/rec/html/journals/tods/AgarwalCHPWY13";></a> <!-- does 
not work with https --></p>
 
-<p>This algorithm is independent of the distribution of values, which can be 
anywhere in the
-range of the IEEE-754 64-bit doubles.</p>
+<p>This algorithm is independent of the distribution of values, which can be 
anywhere in the range of the IEEE-754 64-bit doubles.</p>
 
 <p>This algorithm intentionally inserts randomness into the sampling process 
for values that
 ultimately get retained in the sketch. The result is that this algorithm is 
not 
diff --git a/output/docs/Quantiles/SketchingQuantilesAndRanksTutorial.html 
b/output/docs/Quantiles/SketchingQuantilesAndRanksTutorial.html
index 0d45b6fa..ca43bf97 100644
--- a/output/docs/Quantiles/SketchingQuantilesAndRanksTutorial.html
+++ b/output/docs/Quantiles/SketchingQuantilesAndRanksTutorial.html
@@ -525,7 +525,7 @@ of quantiles, ranks and their functions.</p>
 
 <ul>
   <li>
-    <p>The <strong>natural rank</strong> is a <strong>natural number</strong> 
from the set of one-based, natural numbers, ℕ<sub>1</sub>, and is derived by 
enumerating an ordered set of values, starting with the value 1, up to 
<em>n</em>, the number of values in the set.</p>
+    <p>The <strong>natural rank</strong> is a <strong>natural number</strong> 
from the set of one-based, natural numbers, ℕ<sub>1</sub>, and is derived by 
enumerating an ordered set of values, starting with the value 1, up to 
<em>n</em>, the number of values in the original set.</p>
   </li>
   <li>The <strong><em>normalized rank</em></strong> is a number between 0.0 
and 1.0 computed by dividing the <em>natural rank</em> by the total number of 
values in the set, <em>n</em>. Thus, for finite sets, any <em>normalized 
rank</em> is in the range (0, 1]. Normalized ranks are often written as a 
percent. But don’t confuse percent with percentile! This will be explained 
below.</li>
   <li>A rank of 0, whether natural or normalized, represents the empty 
set.</li>
@@ -585,9 +585,12 @@ To wit:</p>
   </tbody>
 </table>
 
+<h3 id="note">Note:</h3>
+<p>The term “value” can be ambiguous because items that we stream into a 
sketch are values and numeric ranks are also values.  To avoid this ambiguity, 
we will use the term “quantiles” to refer to values that are streamed into a 
sketch even before they have been associated with a rank.</p>
+
 <p>Let’s define the simple functions</p>
 
-<h3 
id="quantilerank-or-qr--return-the-quantile-value-q-associated-with-a-given-rank-r"><strong><em>quantile(rank)</em></strong>
 or <strong><em>q(r)</em></strong> := return the quantile value 
<strong><em>q</em></strong> associated with a given <strong><em>rank, 
r</em></strong>.</h3>
+<h3 
id="quantilerank-or-qr--return-the-quantile-q-associated-with-a-given-rank-r"><strong><em>quantile(rank)</em></strong>
 or <strong><em>q(r)</em></strong> := return the quantile 
<strong><em>q</em></strong> associated with a given <strong><em>rank, 
r</em></strong>.</h3>
 
 <h3 
id="rankquantile-or-rq--return-the-rank-r-associated-with-a-given-quantile-q"><strong><em>rank(quantile)</em></strong>
 or <strong><em>r(q)</em></strong> := return the rank 
<strong><em>r</em></strong> associated with a given <strong><em>quantile, 
q</em></strong>.</h3>
 
@@ -619,7 +622,7 @@ To wit:</p>
 <p>And this is certainly true of the table above.</p>
 
 <h2 id="the-challenge-of-duplicates">The challenge of duplicates</h2>
-<p>With real data we often encounter duplicate values in the stream. Let’s 
examine this next table.</p>
+<p>With real data we often encounter duplicate quantiles in the stream. Let’s 
examine this next table.</p>
 
 <table>
   <thead>
@@ -644,11 +647,10 @@ To wit:</p>
   </tbody>
 </table>
 
-<p>As you can see <em>q(r)</em> is straightforward. But how about 
<em>r(q)</em>?  Which of the rank values 2, 3, or 4 should the function return 
given the value 20?  Given this data, and our definitions so far, 
-the function <em>r(q)</em> is ambiguous. We will see how to resolve this 
shortly.</p>
+<p>As you can see <em>q(r)</em> is straightforward. But how about 
<em>r(q)</em>?  Which of the ranks 2, 3, or 4 should the function return, given 
the quantile 20?  Given this data, and our definitions so far, the function 
<em>r(q)</em> is ambiguous. We will see how to resolve this shortly.</p>
 
 <h2 id="the-challenge-of-approximation">The challenge of approximation</h2>
-<p>By definition, sketching algorithms are approximate, and they achieve their 
high performance by discarding  data.  Suppose you feed <em>n</em> items into a 
sketch that retains only <em>m &lt; n</em> items. This means <em>n-m</em> 
values were discarded.  The sketch must track the value <em>n</em> used for 
computing the rank and quantile functions. When the sketch reconstructs the 
relationship between ranks and values <em>n-m</em> rank values are missing 
creating holes in the sequence o [...]
+<p>By definition, sketching algorithms are approximate, and they achieve their 
high performance by discarding data.  Suppose you feed <em>n</em> quantiles 
into a sketch that retains only <em>m &lt; n</em> quantiles. This means 
<em>n-m</em> quantiles were discarded.  The sketch must track the quantity 
<em>n</em> used for computing the rank and quantile functions. When the sketch 
reconstructs the relationship between ranks and quantiles, <em>n-m</em> 
quantiles are missing creating holes in [...]
 
 <p>The raw data might look like this, with its associated natural ranks.</p>
 
@@ -685,7 +687,7 @@ the function <em>r(q)</em> is ambiguous. We will see how to 
resolve this shortly
   </tbody>
 </table>
 
-<p>The sketch might discard the even values producing something like this:</p>
+<p>The sketch might discard the even numbered quantiles producing something 
like this:</p>
 
 <table>
   <thead>
@@ -710,12 +712,12 @@ the function <em>r(q)</em> is ambiguous. We will see how 
to resolve this shortly
   </tbody>
 </table>
 
-<p>When the sketch deletes values it adjusts the associated ranks by 
effectively increasing the “weight” of adjacent items so that they are 
positionally approximately correct and the top rank corresponds to 
<em>n</em>.</p>
+<p>When the sketch deletes quantiles it adjusts the associated ranks by 
effectively increasing the “weight” of adjacent quantiles so that they are 
approximately positionally correct and the top natural rank corresponds to 
<em>n</em>.</p>
 
 <p>How do we resolve <em>q(3)</em> or <em>r(20)</em>?</p>
 
 <h2 id="the-need-for-inequality-search">The need for inequality search</h2>
-<p>The quantile sketch algorithms discussed in the literature primarily differ 
by how they choose which values in the stream should be discarded. After the 
elimination process, all of the quantiles sketch implementations are left with 
the challenge of how to reconstruct the actual distribution, approximately and 
with good accuracy.</p>
+<p>The quantile sketch algorithms discussed in the literature primarily differ 
by how they choose which quantiles in the stream should be discarded. After the 
elimination process, all of the quantiles sketch implementations are left with 
the challenge of how to reconstruct the actual distribution, approximately and 
with good accuracy.</p>
 
 <p>Given the presence of duplicates and absence of values from the stream we 
must redefine the above quantile and rank functions as inequalities 
<strong>while retaining the properties of 1:1 functions.</strong></p>
 
@@ -727,7 +729,7 @@ the function <em>r(q)</em> is ambiguous. We will see how to 
resolve this shortly
 
 <ul>
   <li>
-    <p><strong>Rule 1:</strong> Every Quantile that exists in the input stream 
or retained by the sketch has an associated Rank.</p>
+    <p><strong>Rule 1:</strong> Every quantile that exists in the input stream 
or retained by the sketch has an associated rank.</p>
   </li>
   <li>
     <p><strong>Rule 2:</strong> All of our quantile sketches only retain 
quantiles that exist in the actual input stream of quantiles.</p>
@@ -742,8 +744,8 @@ the function <em>r(q)</em> is ambiguous. We will see how to 
resolve this shortly
     <p><strong>Rule 5:</strong> All of our quantile algorithms compensate for 
quantiles removed during the sketch quantile selection and compression process 
by increasing the weights of some of the quantiles not selected for removal, 
such that:</p>
 
     <ul>
-      <li>The sum of the natural weights of all quantiles retained by the 
sketch equals <strong>N</strong>, the total count of all quantiles given to the 
sketch.</li>
-      <li>And by corollary, the largest quantile, when sorted by cumulative 
rank, has a cumulative natural rank of <strong>N</strong>, or equivalently, a 
cumulative normalized rank of <strong>1.0</strong>.</li>
+      <li>The sum of the natural weights of all quantiles retained by the 
sketch equals <strong>n</strong>, the total count of all quantiles given to the 
sketch.</li>
+      <li>And by corollary, the largest quantile, when sorted by cumulative 
rank, has a cumulative natural rank of <strong>n</strong>, or equivalently, a 
cumulative normalized rank of <strong>1.0</strong>.</li>
     </ul>
   </li>
 </ul>
@@ -1874,6 +1876,8 @@ the function <em>r(q)</em> is ambiguous. We will see how 
to resolve this shortly
 
 <h3 
id="quantilerank-exclusive_strict-or-qr-gt_strict-given-r-return-the-quantile-q-of-the-smallest-rank-that-is-strictly-greater-than-r"><strong><em>quantile(rank,
 EXCLUSIVE_STRICT)</em></strong> or <strong><em>q(r, GT_STRICT)</em></strong> 
:=<br />Given <em>r</em>, return the quantile, <em>q</em>, of the smallest rank 
that is strictly Greater Than <em>r</em>.</h3>
 
+<h3 
id="note-this-rule-is-marginal-in-its-usefulness-so-it-is-not-currently-implemented">Note:
 This rule is marginal in its usefulness so it is not currently 
implemented.</h3>
+
 <p><b>Implementation:</b></p>
 
 <ul>
@@ -2210,7 +2214,7 @@ the function <em>r(q)</em> is ambiguous. We will see how 
to resolve this shortly
   </tbody>
 </table>
 
-<h2 
id="these-inequality-functions-maintain-the-11-functional-relationship">These 
inequality functions maintain the 1:1 functional relationship</h2>
+<h2 
id="these-inequality-functions-maintain-the-11-functional-relationship-approximately">These
 inequality functions maintain the 1:1 functional relationship, 
approximately.</h2>
 
 <h3 
id="the-exclusive-search-for-qr-is-the-inverse-of-the-exclusive-search-for-rq">The
 <em>exclusive</em> search for q(r) is the inverse of the <em>exclusive</em> 
search for r(q).</h3>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[datasketches-website] branch asf-site updated: Automatic Site Publish by Buildbot

Reply via email to