This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 7b5e665c Automatic Site Publish by Buildbot
7b5e665c is described below
commit 7b5e665c7f11c721a2eb6c6f82d5b0cb2a0b3585
Author: buildbot <[email protected]>
AuthorDate: Tue Sep 24 01:46:08 2024 +0000
Automatic Site Publish by Buildbot
---
output/docs/KLL/KLLSketch.html | 133 +++++++++++++++++----
output/docs/Quantiles/ClassicQuantilesSketch.html | 56 ++++++++-
output/docs/Quantiles/QuantilesSketchOverview.html | 28 +++--
3 files changed, 181 insertions(+), 36 deletions(-)
diff --git a/output/docs/KLL/KLLSketch.html b/output/docs/KLL/KLLSketch.html
index a0e13500..a6764edf 100644
--- a/output/docs/KLL/KLLSketch.html
+++ b/output/docs/KLL/KLLSketch.html
@@ -352,45 +352,116 @@
specific language governing permissions and limitations
under the License.
-->
+<h2 id="contents">Contents</h2>
+<!-- TOC -->
+<ul>
+ <li><a
href="https://datasketches.apache.org/docs/QuantilesAll/QuantilesOverview.html">Introduction
to the Quantile Sketches</a></li>
+ <li><a href="#kll-sketch">Kll Sketch</a>
+ <ul>
+ <li><a href="#comparing">Comparing the KllSketches with the original
classic Quantiles Sketches</a></li>
+ <li><a href="#plots">Plots for KllDoublesSketch vs. classic Quantiles
DoublesSketch</a></li>
+ <li><a href="#simple-example">Simple Java KLL Example</a></li>
+ </ul>
+ </li>
+ <li><a
href="https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html">KLL
Accuracy And Size</a></li>
+ <li><a
href="https://datasketches.apache.org/docs/KLL/UnderstandingKLLBounds.html">Understanding
KLL Bounds</a></li>
+ <li>Examples
+ <ul>
+ <li><a
href="https://datasketches.apache.org/docs/KLL/KLLCppExample.html">KLL Sketch
C++ Example</a></li>
+ </ul>
+ </li>
+ <li>Tutorials
+ <ul>
+ <li><a
href="https://datasketches.apache.org/docs/QuantilesAll/SketchingQuantilesAndRanksTutorial.html">Sketching
Quantiles and Ranks Tutorial</a></li>
+ </ul>
+ </li>
+ <li>Theory
+ <ul>
+ <li><a
href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/Quantiles_KLL.pdf">Optimal
Quantile Approximation in Streams</a></li>
+ <li><a
href="https://datasketches.apache.org/docs/QuantilesAll/QuantilesReferences.html">References</a>
+<!-- TOC --></li>
+ </ul>
+ </li>
+</ul>
+
+<p><a id="kll-sketch"></a></p>
<h2 id="kll-sketch">KLL Sketch</h2>
-<p>Implementation of a very compact quantiles sketch with lazy compaction
scheme and nearly optimal accuracy per bit.
-See <a href="https://arxiv.org/abs/1603.05346v2">Optimal Quantile
Approximation in Streams, by Zohar Karnin, Kevin Lang, Edo Liberty</a>.
-The name KLL is composed of the initial letters of the last names of the
authors.</p>
+<p>This is an implementation of a very compact quantiles sketch with lazy
compaction scheme and nearly optimal accuracy per bit of storage. The
underlying theoretical work is the paper
+<a href="https://arxiv.org/abs/1603.05346v2">Optimal Quantile Approximation in
Streams, by Zohar Karnin, Kevin Lang, Edo Liberty</a>. The name KLL is composed
of the initial letters of the last names of the authors.</p>
-<p>The usage of KllSketch is very similar to the classic quantiles
DoublesSketch.</p>
+<p>This implementation includes 16 variations of the KLL Sketch, including a
base KllSketch for methods common to all sketches. The implementation
variations are across 3 different dimensions:</p>
<ul>
- <li>The key feature of this sketch is its compactness for a given
accuracy.</li>
- <li>It is separately implemented for both float and double values and can be
configured for use on-heap or off-heap (Direct mode).</li>
- <li>The parameter K that affects the accuracy and the size of the sketch is
not restricted to powers of 2.</li>
- <li>The default of 200 was chosen to yield approximately the same normalized
rank error (1.65%) as the classic quantiles DoublesSketch (K=128, error
1.73%).</li>
+ <li>Input type: double, float, long, item(generic)</li>
+ <li>Memory type: heap, direct (off-heap)</li>
+ <li>Stored Size: compact (read-only), updatable</li>
</ul>
-<h3 id="java-example">Java example</h3>
+<p>With the one exception that the KllItemSketch is not available in direct,
updatable form.
+The classes are organized in an inheritance hierarchy as follows:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.datasketches.kll.KllFloatsSketch;
+<ul>
+ <li>Public KllSketch
+ <ul>
+ <li>Public KllDoublesSketch
+ <ul>
+ <li>KllHeapDoublesSketch</li>
+ <li>KllDirectDoublesSketch
+ <ul>
+ <li>KllDirectCompactDoublesSketch</li>
+ </ul>
+ </li>
+ </ul>
+ </li>
+ <li>Public KllFloatsSketch
+ <ul>
+ <li>KllHeapFloatsSketch</li>
+ <li>KllDirectFloatsSketch
+ <ul>
+ <li>KllDirectCompactFloatsSketch</li>
+ </ul>
+ </li>
+ </ul>
+ </li>
+ <li>Public KllItemsSketch<T>
+</T> <ul>
+ <li>KllHeapItemsSketch<T></T></li>
+ <li>KllDirectCompactItemsSketch<T></T></li>
+ </ul>
+ </li>
+ <li>Public KllLongsSketch
+ <ul>
+ <li>KllHeapLongsSketch</li>
+ <li>KllDirectLongsSketch
+ <ul>
+ <li>KllDirectCompactLongsSketch</li>
+ </ul>
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </li>
+</ul>
-KllFloatsSketch sketch = KllFloatsSketch.newHeapInstance();
-int n = 1000000;
-for (int i = 0; i < n; i++) {
- sketch.update(i);
-}
-float median = sketch.getQuantile(0.5);
-double rankOf1000 = sketch.getRank(1000);
-</code></pre></div></div>
+<p>The internal package-private variations are constructed using static
factory methods from the 4 outer public classes for doubles, floats, items, and
longs, respectively</p>
-<h3
id="differences-of-kllsketch-from-the-original-quantiles-doublessketch">Differences
of KllSketch from the original quantiles DoublesSketch</h3>
+<p><a id="comparing"></a></p>
+<h3
id="comparing-the-kll-sketches-with-the-original-classic-quantiles-sketches">Comparing
the KLL Sketches with the original classic Quantiles Sketches</h3>
+
+<p>The usage of KllDoublesSketch is very similar to the classic quantiles
DoublesSketch.</p>
<ul>
- <li>KLL has a smaller size for the same accuracy.</li>
- <li>KLL is slightly faster to update.</li>
- <li>The KLL parameter K doesn’t have to be power of 2.</li>
- <li>KLL operates with either float values or double values.</li>
+ <li>The key feature of KLL sketch is its compactness for a given accuracy.
KLL has a much smaller size for the same accuracy (see the plots below).</li>
+ <li>The KLL parameter K, which affects accuracy and size, doesn’t have to be
power of 2. The default K of 200 was chosen to yield approximately the same
normalized rank error (1.65%) as the classic quantiles DoublesSketch (K=128,
error 1.73%).</li>
+ <li>The classic quantiles sketch only has double and item(generic) input
types, while KLL (as mentioned above) is implemented with double, float, long,
and item(generic) types.</li>
<li>KLL uses a merge method rather than a union object.</li>
</ul>
-<p>The starting point for the comparison is setting K in such a way that rank
error would be approximately the same. As pointed out above, the default K for
both sketches should achieve this. Here is the comparison of the single-sided
normalized rank error (getRank() method) for the default K:</p>
+<p><a id="plots"></a></p>
+<h3
id="plot-comparisons-of-klldoublessketch-vs-classic-quantiles-doublessketch">Plot
Comparisons of KllDoublesSketch vs. classic Quantiles DoublesSketch</h3>
+
+<p>The starting point for the plot comparisons is setting K in such a way that
rank error would be approximately the same. As pointed out above, the default K
for both sketches should achieve this. Here is the comparison of the
single-sided normalized rank error (getRank() method) for the default K:</p>
<p><img class="doc-img-full"
src="/docs/img/kll/kll200-vs-ds128-rank-error.png" alt="RankError" /></p>
@@ -412,6 +483,20 @@ double rankOf1000 = sketch.getRank(1000);
<p><img class="doc-img-full" src="/docs/img/kll/kll200-vs-ds128-update.png"
alt="UpdateTime" /></p>
+<p><a id="simple-example"></a></p>
+<h3 id="simple-java-kll-floats-example">Simple Java KLL Floats example</h3>
+
+<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.datasketches.kll.KllFloatsSketch;
+
+KllFloatsSketch sketch = KllFloatsSketch.newHeapInstance();
+int n = 1000000;
+for (int i = 0; i < n; i++) {
+ sketch.update(i);
+}
+float median = sketch.getQuantile(0.5);
+double rankOf1000 = sketch.getRank(1000);
+</code></pre></div></div>
+
</div> <!-- End content -->
</div> <!-- End row -->
</div> <!-- End Container -->
diff --git a/output/docs/Quantiles/ClassicQuantilesSketch.html
b/output/docs/Quantiles/ClassicQuantilesSketch.html
index 23b37197..1f6850e6 100644
--- a/output/docs/Quantiles/ClassicQuantilesSketch.html
+++ b/output/docs/Quantiles/ClassicQuantilesSketch.html
@@ -352,8 +352,56 @@
specific language governing permissions and limitations
under the License.
-->
+<h2 id="contents">Contents</h2>
+<!-- TOC -->
+<ul>
+ <li><a
href="https://datasketches.apache.org/docs/QuantilesAll/QuantilesOverview.html">Introduction
to the Quantile Sketches</a></li>
+ <li><a href="#classic-quantiles-sketch">Classic Quantiles Sketch</a>
+ <ul>
+ <li><a
href="https://datasketches.apache.org/docs/Quantiles/QuantilesSketchOverview.html">Classsic
Quantiles Sketch Overview</a></li>
+ <li><a href="#accuracy-and-size">Accuracy and Size</a>
+ <ul>
+ <li><a href="#absolute-vs-relative-error">Absolute vs Relative
Error</a></li>
+ </ul>
+ </li>
+ <li><a href="#data-independence">Quantiles Sketches and Data
Independence</a></li>
+ <li><a href="#accuracy-table">Accuracy Table</a></li>
+ <li><a href="#accuracy-plots">Accuracy Plots</a></li>
+ </ul>
+ </li>
+ <li>Examples
+ <ul>
+ <li><a
href="https://datasketches.apache.org/docs/Quantiles/QuantilesJavaExample.html">Classic
Quantiles Java Example</a></li>
+ <li><a
href="https://datasketches.apache.org/docs/Quantiles/QuantilesPigUDFs.html">Classic
Quantiles Pig UDFs</a></li>
+ <li><a
href="https://datasketches.apache.org/docs/Quantiles/QuantilesHiveUDFs.html">Classic
Quantiles Hive UDFs</a></li>
+ </ul>
+ </li>
+ <li>Classic Quantiles Studies
+ <ul>
+ <li><a
href="https://datasketches.apache.org/docs/QuantilesStudies/DruidApproxHistogramStudy.html">Druid
Approximate Histogram</a></li>
+ <li><a
href="https://datasketches.apache.org/docs/QuantilesStudies/MomentsSketchStudy.html">Moments
Sketch Study</a></li>
+ <li><a
href="https://datasketches.apache.org/docs/QuantilesStudies/QuantilesStreamAStudy.html">Quantiles
StreamA Study</a></li>
+ <li><a
href="https://datasketches.apache.org/docs/QuantilesStudies/ExactQuantiles.html">Exact
Quantiles for Studies</a></li>
+ </ul>
+ </li>
+ <li>Tutorials
+ <ul>
+ <li><a
href="https://datasketches.apache.org/docs/QuantilesAll/SketchingQuantilesAndRanksTutorial.html">Sketching
Quantiles and Ranks Tutorial</a></li>
+ </ul>
+ </li>
+ <li>Theory
+ <ul>
+ <li><a href="https://arxiv.org/abs/2004.01668">Relative Error Streaming
Quantiles</a></li>
+ <li><a
href="https://datasketches.apache.org/docs/QuantilesAll/QuantilesReferences.html">More
References</a>
+<!-- TOC --></li>
+ </ul>
+ </li>
+</ul>
+
+<p><a id="classic-quantiles-sketch"></a></p>
<h1 id="classic-quantiles-sketch">Classic Quantiles Sketch</h1>
+<p><a id="accuracy-and-size"></a></p>
<h2 id="quantiles-sketches-accuracy-and-size">Quantiles Sketches Accuracy and
Size</h2>
<p>Please review the Quantiles <a
href="/docs/QuantilesAll/SketchingQuantilesAndRanksTutorial.html">Tutorial</a>.</p>
@@ -362,12 +410,14 @@ the overall size of the sketch.</p>
<p>Accuracy for quantiles sketches is specified and measured with respect to
the <em>rank</em> only, not the values.</p>
+<p><a id="absolute-vs-relative-error"></a></p>
<h3 id="absolute-vs-relative-error">Absolute vs Relative Error</h3>
<p>The Quantiles/DoublesSketch and the KLL Sketch have <em>absolute
error</em>. For example, a specified accuracy of 1% at the median (rank =
0.50) means that the true value (if you could extract it from the set) should
be
between <em>getQuantile(0.49)</em> and <em>getQuantile(0.51)</em>. This same
1% error applied at a rank of 0.95 means that the true value should be between
<em>getQuantile(0.94)</em> and <em>getQuantile(0.96)</em>. In other words, the
error is a fixed +/- epsilon for the entire range of rank values.</p>
<p>The ReqSketch, however, has relative rank error and the user can choose
which end of the rank domain should have high accuracy. Refer to the sketch
documentation for more information.</p>
+<p><a id="data-independence"></a></p>
<h2 id="quantiles-sketches-and-data-independence">Quantiles Sketches and Data
Independence</h2>
<p>A <em>sketch</em> is an implementation of a <em>streaming algorithm</em>.
By definition, a sketch has only one chance to examine each item of the stream.
It is this property that makes a sketch a <em>streaming</em> algorithm and
useful for real-time analysis of very large streams that may be impractical to
actually store.</p>
@@ -375,7 +425,8 @@ between <em>getQuantile(0.49)</em> and
<em>getQuantile(0.51)</em>. This same 1%
<p>The only thing the user needs to know is how to extract the values from the
stream so that they can be fed into the sketch. It is reasonable that the user
knows the <em>type</em> of values in the stream: e.g., are they alphanumeric
strings, numeric strings, or numeric primitives. These properties may determine
the type of sketch to use as well as how to extract the appropriate quantities
to feed into the sketch.</p>
-<h2
id="accuracy-information-for-the-orgapachedatasketchesquantiles-sketch-package">Accuracy
Information for the org.apache.datasketches.quantiles Sketch Package</h2>
+<p><a id="accuracy-table"></a></p>
+<h2 id="accuracy-table-for-the-classic-quantiles-sketch">Accuracy Table for
the Classic Quantiles Sketch</h2>
<p>A <i>k</i> of 256 produces a normalized rank error of less than 1%.
For example, the median value returned from getQuantile(0.5) will be between
the actual values
from the hypothetically sorted array of input values at normalized ranks of
0.49 and 0.51, with
@@ -427,6 +478,9 @@ Table Guide for Quantiles DoublesSketch Size in Bytes and
Approximate Error:
4,294,967,295 | 3,744 7,200 13,856 26,656 51,232 98,336
188,448 360,480 688,160 1,310,752 2,490,400 4,718,624
</pre>
+<p><a id="accuracy-plots"></a></p>
+<h2 id="accuracy-plots">Accuracy Plots</h2>
+
<p>The following graphs illustrate the ability of the Quantiles DoublesSketch
to characterize value distributions.</p>
<ul>
diff --git a/output/docs/Quantiles/QuantilesSketchOverview.html
b/output/docs/Quantiles/QuantilesSketchOverview.html
index 0d26f0df..4d303ad9 100644
--- a/output/docs/Quantiles/QuantilesSketchOverview.html
+++ b/output/docs/Quantiles/QuantilesSketchOverview.html
@@ -352,7 +352,17 @@
specific language governing permissions and limitations
under the License.
-->
-<h1 id="quantiles-sketch-overview">Quantiles Sketch Overview</h1>
+<h2 id="contents">Contents</h2>
+
+<ul>
+ <li><a href="#overview">Classic Quantiles Sketch Overview</a></li>
+ <li><a href="#numeric-quantiles">Numeric Quantiles</a></li>
+ <li><a href="#extending">Extending Generic Quantiles Classes</a></li>
+ <li><a href="#notes">Implementation Notes</a></li>
+</ul>
+
+<p><a id="overview"></a></p>
+<h1 id="classic-quantiles-sketch-overview">Classic Quantiles Sketch
Overview</h1>
<p>Package: org.apache.datasketches.quantiles</p>
@@ -365,14 +375,8 @@ Probability Mass Function, getPMF(), and the Cumulative
Distribution Function, g
<li><strong>NOTE:</strong> See also the <a
href="/docs/KLL/KLLSketch.html">KLL Sketch</a>.</li>
</ul>
-<h3 id="section-links">Section links:</h3>
-<ul>
- <li><a href="#Section 1">Section 1</a> Numeric Quantiles</li>
- <li><a href="#Section 2">Section 2</a> Extending Generic Quantiles
Classes</li>
- <li><a href="#Section 3">Section 3</a> Implementation Notes</li>
-</ul>
-
-<h2 id="numeric-quantiles"><a name="Section 1"></a>Numeric Quantiles</h2>
+<p><a id="numeric-quantiles"></a></p>
+<h2 id="numeric-quantiles">Numeric Quantiles</h2>
<p>Consider this real data example of a stream of 230 million time-spent
events collected from one our systems for a period of just 30 minutes. Each
event records the amount of time in milliseconds that a user spends on a web
page before moving to a different web page by taking some action, such as a
click.</p>
@@ -517,7 +521,8 @@ System.out.println(qs3.toString()); //Primarily for
debugging
*/
</code></pre></div></div>
-<h2 id="extending-generic-quantiles-classes"><a name="Section 2"></a>Extending
Generic Quantiles Classes</h2>
+<p><a id="extending"></a></p>
+<h2 id="extending-generic-quantiles-classes">Extending Generic Quantiles
Classes</h2>
<p>Any item type that is comparable, or for which you can create a Comparator,
can also be analyzed by extending the abstract generic classes for that
particular item.</p>
@@ -583,7 +588,8 @@ MyItem[] itemSplitPoints =
sketch.getQuantiles(rankFractions);
<p>Using a simple binary search you can now split your data into the 10
partitions.</p>
-<h2 id="implementation-notes"><a name="Section 3"></a>Implementation Notes</h2>
+<p><a id="notes"></a></p>
+<h2 id="implementation-notes">Implementation Notes</h2>
<p>The quantiles algorithm is an implementation of the Low Discrepancy
Mergeable Quantiles Sketch, using double values, described in section 3.2 of
the journal version of the paper “Mergeable Summaries” by Agarwal, Cormode,
Huang, Phillips, Wei, and Yi.
<a href="http://dblp.org/rec/html/journals/tods/AgarwalCHPWY13"></a> <!-- does
not work with https --></p>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]