(datasketches-website) branch asf-site updated: Automatic Site Publish by Buildbot

git-site-role Mon, 23 Sep 2024 18:46:16 -0700

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 7b5e665c Automatic Site Publish by Buildbot
7b5e665c is described below

commit 7b5e665c7f11c721a2eb6c6f82d5b0cb2a0b3585
Author: buildbot <[email protected]>
AuthorDate: Tue Sep 24 01:46:08 2024 +0000

    Automatic Site Publish by Buildbot
---
 output/docs/KLL/KLLSketch.html                     | 133 +++++++++++++++++----
 output/docs/Quantiles/ClassicQuantilesSketch.html  |  56 ++++++++-
 output/docs/Quantiles/QuantilesSketchOverview.html |  28 +++--
 3 files changed, 181 insertions(+), 36 deletions(-)

diff --git a/output/docs/KLL/KLLSketch.html b/output/docs/KLL/KLLSketch.html
index a0e13500..a6764edf 100644
--- a/output/docs/KLL/KLLSketch.html
+++ b/output/docs/KLL/KLLSketch.html
@@ -352,45 +352,116 @@
     specific language governing permissions and limitations
     under the License.
 -->
+<h2 id="contents">Contents</h2>
+<!-- TOC -->
+<ul>
+  <li><a 
href="https://datasketches.apache.org/docs/QuantilesAll/QuantilesOverview.html";>Introduction
 to the Quantile Sketches</a></li>
+  <li><a href="#kll-sketch">Kll Sketch</a>
+    <ul>
+      <li><a href="#comparing">Comparing the KllSketches with the original 
classic Quantiles Sketches</a></li>
+      <li><a href="#plots">Plots for KllDoublesSketch vs. classic Quantiles 
DoublesSketch</a></li>
+      <li><a href="#simple-example">Simple Java KLL Example</a></li>
+    </ul>
+  </li>
+  <li><a 
href="https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html";>KLL 
Accuracy And Size</a></li>
+  <li><a 
href="https://datasketches.apache.org/docs/KLL/UnderstandingKLLBounds.html";>Understanding
 KLL Bounds</a></li>
+  <li>Examples
+    <ul>
+      <li><a 
href="https://datasketches.apache.org/docs/KLL/KLLCppExample.html";>KLL Sketch 
C++ Example</a></li>
+    </ul>
+  </li>
+  <li>Tutorials
+    <ul>
+      <li><a 
href="https://datasketches.apache.org/docs/QuantilesAll/SketchingQuantilesAndRanksTutorial.html";>Sketching
 Quantiles and Ranks Tutorial</a></li>
+    </ul>
+  </li>
+  <li>Theory
+    <ul>
+      <li><a 
href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/Quantiles_KLL.pdf";>Optimal
 Quantile Approximation in Streams</a></li>
+      <li><a 
href="https://datasketches.apache.org/docs/QuantilesAll/QuantilesReferences.html";>References</a>
+<!-- TOC --></li>
+    </ul>
+  </li>
+</ul>
+
+<p><a id="kll-sketch"></a></p>
 <h2 id="kll-sketch">KLL Sketch</h2>
 
-<p>Implementation of a very compact quantiles sketch with lazy compaction 
scheme and nearly optimal accuracy per bit.
-See <a href="https://arxiv.org/abs/1603.05346v2";>Optimal Quantile 
Approximation in Streams, by Zohar Karnin, Kevin Lang, Edo Liberty</a>.
-The name KLL is composed of the initial letters of the last names of the 
authors.</p>
+<p>This is an implementation of a very compact quantiles sketch with lazy 
compaction scheme and nearly optimal accuracy per bit of storage.  The 
underlying theoretical work is the paper 
+<a href="https://arxiv.org/abs/1603.05346v2";>Optimal Quantile Approximation in 
Streams, by Zohar Karnin, Kevin Lang, Edo Liberty</a>. The name KLL is composed 
of the initial letters of the last names of the authors.</p>
 
-<p>The usage of KllSketch is very similar to the classic quantiles 
DoublesSketch.</p>
+<p>This implementation includes 16 variations of the KLL Sketch, including a 
base KllSketch for methods common to all sketches. The implementation 
variations are across 3 different dimensions:</p>
 
 <ul>
-  <li>The key feature of this sketch is its compactness for a given 
accuracy.</li>
-  <li>It is separately implemented for both float and double values and can be 
configured for use on-heap or off-heap (Direct mode).</li>
-  <li>The parameter K that affects the accuracy and the size of the sketch is 
not restricted to powers of 2.</li>
-  <li>The default of 200 was chosen to yield approximately the same normalized 
rank error (1.65%) as the classic quantiles DoublesSketch (K=128, error 
1.73%).</li>
+  <li>Input type: double, float, long, item(generic)</li>
+  <li>Memory type: heap, direct (off-heap)</li>
+  <li>Stored Size: compact (read-only), updatable</li>
 </ul>
 
-<h3 id="java-example">Java example</h3>
+<p>With the one exception that the KllItemSketch is not available in direct, 
updatable form.
+The classes are organized in an inheritance hierarchy as follows:</p>
 
-<div class="highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code>import org.apache.datasketches.kll.KllFloatsSketch;
+<ul>
+  <li>Public KllSketch
+    <ul>
+      <li>Public KllDoublesSketch
+        <ul>
+          <li>KllHeapDoublesSketch</li>
+          <li>KllDirectDoublesSketch
+            <ul>
+              <li>KllDirectCompactDoublesSketch</li>
+            </ul>
+          </li>
+        </ul>
+      </li>
+      <li>Public KllFloatsSketch
+        <ul>
+          <li>KllHeapFloatsSketch</li>
+          <li>KllDirectFloatsSketch
+            <ul>
+              <li>KllDirectCompactFloatsSketch</li>
+            </ul>
+          </li>
+        </ul>
+      </li>
+      <li>Public KllItemsSketch<T>
+</T>        <ul>
+          <li>KllHeapItemsSketch<T></T></li>
+          <li>KllDirectCompactItemsSketch<T></T></li>
+        </ul>
+      </li>
+      <li>Public KllLongsSketch
+        <ul>
+          <li>KllHeapLongsSketch</li>
+          <li>KllDirectLongsSketch
+            <ul>
+              <li>KllDirectCompactLongsSketch</li>
+            </ul>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
 
-KllFloatsSketch sketch = KllFloatsSketch.newHeapInstance();
-int n = 1000000;
-for (int i = 0; i &lt; n; i++) {
-  sketch.update(i);
-}
-float median = sketch.getQuantile(0.5);
-double rankOf1000 = sketch.getRank(1000);
-</code></pre></div></div>
+<p>The internal package-private variations are constructed using static 
factory methods from the 4 outer public classes for doubles, floats, items, and 
longs, respectively</p>
 
-<h3 
id="differences-of-kllsketch-from-the-original-quantiles-doublessketch">Differences
 of KllSketch from the original quantiles DoublesSketch</h3>
+<p><a id="comparing"></a></p>
+<h3 
id="comparing-the-kll-sketches-with-the-original-classic-quantiles-sketches">Comparing
 the KLL Sketches with the original classic Quantiles Sketches</h3>
+
+<p>The usage of KllDoublesSketch is very similar to the classic quantiles 
DoublesSketch.</p>
 
 <ul>
-  <li>KLL has a smaller size for the same accuracy.</li>
-  <li>KLL is slightly faster to update.</li>
-  <li>The KLL parameter K doesn’t have to be power of 2.</li>
-  <li>KLL operates with either float values or double values.</li>
+  <li>The key feature of KLL sketch is its compactness for a given accuracy. 
KLL has a much smaller size for the same accuracy (see the plots below).</li>
+  <li>The KLL parameter K, which affects accuracy and size, doesn’t have to be 
power of 2. The default K of 200 was chosen to yield approximately the same 
normalized rank error (1.65%) as the classic quantiles DoublesSketch (K=128, 
error 1.73%).</li>
+  <li>The classic quantiles sketch only has double and item(generic) input 
types, while KLL (as mentioned above) is implemented with double, float, long, 
and item(generic) types.</li>
   <li>KLL uses a merge method rather than a union object.</li>
 </ul>
 
-<p>The starting point for the comparison is setting K in such a way that rank 
error would be approximately the same. As pointed out above, the default K for 
both sketches should achieve this. Here is the comparison of the single-sided 
normalized rank error (getRank() method) for the default K:</p>
+<p><a id="plots"></a></p>
+<h3 
id="plot-comparisons-of-klldoublessketch-vs-classic-quantiles-doublessketch">Plot
 Comparisons of KllDoublesSketch vs. classic Quantiles DoublesSketch</h3>
+
+<p>The starting point for the plot comparisons is setting K in such a way that 
rank error would be approximately the same. As pointed out above, the default K 
for both sketches should achieve this. Here is the comparison of the 
single-sided normalized rank error (getRank() method) for the default K:</p>
 
 <p><img class="doc-img-full" 
src="/docs/img/kll/kll200-vs-ds128-rank-error.png" alt="RankError" /></p>
 
@@ -412,6 +483,20 @@ double rankOf1000 = sketch.getRank(1000);
 
 <p><img class="doc-img-full" src="/docs/img/kll/kll200-vs-ds128-update.png" 
alt="UpdateTime" /></p>
 
+<p><a id="simple-example"></a></p>
+<h3 id="simple-java-kll-floats-example">Simple Java KLL Floats example</h3>
+
+<div class="highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code>import org.apache.datasketches.kll.KllFloatsSketch;
+
+KllFloatsSketch sketch = KllFloatsSketch.newHeapInstance();
+int n = 1000000;
+for (int i = 0; i &lt; n; i++) {
+  sketch.update(i);
+}
+float median = sketch.getQuantile(0.5);
+double rankOf1000 = sketch.getRank(1000);
+</code></pre></div></div>
+
       </div> <!-- End content -->
     </div> <!-- End row -->
   </div> <!-- End Container -->
diff --git a/output/docs/Quantiles/ClassicQuantilesSketch.html 
b/output/docs/Quantiles/ClassicQuantilesSketch.html
index 23b37197..1f6850e6 100644
--- a/output/docs/Quantiles/ClassicQuantilesSketch.html
+++ b/output/docs/Quantiles/ClassicQuantilesSketch.html
@@ -352,8 +352,56 @@
     specific language governing permissions and limitations
     under the License.
 -->
+<h2 id="contents">Contents</h2>
+<!-- TOC -->
+<ul>
+  <li><a 
href="https://datasketches.apache.org/docs/QuantilesAll/QuantilesOverview.html";>Introduction
 to the Quantile Sketches</a></li>
+  <li><a href="#classic-quantiles-sketch">Classic Quantiles Sketch</a>
+    <ul>
+      <li><a 
href="https://datasketches.apache.org/docs/Quantiles/QuantilesSketchOverview.html";>Classsic
 Quantiles Sketch Overview</a></li>
+      <li><a href="#accuracy-and-size">Accuracy and Size</a>
+        <ul>
+          <li><a href="#absolute-vs-relative-error">Absolute vs Relative 
Error</a></li>
+        </ul>
+      </li>
+      <li><a href="#data-independence">Quantiles Sketches and Data 
Independence</a></li>
+      <li><a href="#accuracy-table">Accuracy Table</a></li>
+      <li><a href="#accuracy-plots">Accuracy Plots</a></li>
+    </ul>
+  </li>
+  <li>Examples
+    <ul>
+      <li><a 
href="https://datasketches.apache.org/docs/Quantiles/QuantilesJavaExample.html";>Classic
 Quantiles Java Example</a></li>
+      <li><a 
href="https://datasketches.apache.org/docs/Quantiles/QuantilesPigUDFs.html";>Classic
 Quantiles Pig UDFs</a></li>
+      <li><a 
href="https://datasketches.apache.org/docs/Quantiles/QuantilesHiveUDFs.html";>Classic
 Quantiles Hive UDFs</a></li>
+    </ul>
+  </li>
+  <li>Classic Quantiles Studies
+    <ul>
+      <li><a 
href="https://datasketches.apache.org/docs/QuantilesStudies/DruidApproxHistogramStudy.html";>Druid
 Approximate Histogram</a></li>
+      <li><a 
href="https://datasketches.apache.org/docs/QuantilesStudies/MomentsSketchStudy.html";>Moments
 Sketch Study</a></li>
+      <li><a 
href="https://datasketches.apache.org/docs/QuantilesStudies/QuantilesStreamAStudy.html";>Quantiles
 StreamA Study</a></li>
+      <li><a 
href="https://datasketches.apache.org/docs/QuantilesStudies/ExactQuantiles.html";>Exact
 Quantiles for Studies</a></li>
+    </ul>
+  </li>
+  <li>Tutorials
+    <ul>
+      <li><a 
href="https://datasketches.apache.org/docs/QuantilesAll/SketchingQuantilesAndRanksTutorial.html";>Sketching
 Quantiles and Ranks Tutorial</a></li>
+    </ul>
+  </li>
+  <li>Theory
+    <ul>
+      <li><a href="https://arxiv.org/abs/2004.01668";>Relative Error Streaming 
Quantiles</a></li>
+      <li><a 
href="https://datasketches.apache.org/docs/QuantilesAll/QuantilesReferences.html";>More
 References</a>
+<!-- TOC --></li>
+    </ul>
+  </li>
+</ul>
+
+<p><a id="classic-quantiles-sketch"></a></p>
 <h1 id="classic-quantiles-sketch">Classic Quantiles Sketch</h1>
 
+<p><a id="accuracy-and-size"></a></p>
 <h2 id="quantiles-sketches-accuracy-and-size">Quantiles Sketches Accuracy and 
Size</h2>
 <p>Please review the Quantiles <a 
href="/docs/QuantilesAll/SketchingQuantilesAndRanksTutorial.html">Tutorial</a>.</p>
 
@@ -362,12 +410,14 @@ the overall size of the sketch.</p>
 
 <p>Accuracy for quantiles sketches is specified and measured with respect to 
the <em>rank</em> only, not the values.</p>
 
+<p><a id="absolute-vs-relative-error"></a></p>
 <h3 id="absolute-vs-relative-error">Absolute vs Relative Error</h3>
 <p>The Quantiles/DoublesSketch and the KLL Sketch have <em>absolute 
error</em>.  For example, a specified accuracy of 1% at the median (rank = 
0.50) means that the true value (if you could extract it from the set) should 
be 
 between <em>getQuantile(0.49)</em> and <em>getQuantile(0.51)</em>. This same 
1% error applied at a rank of 0.95 means that the true value should be between 
<em>getQuantile(0.94)</em> and <em>getQuantile(0.96)</em>. In other words, the 
error is a fixed +/- epsilon for the entire range of rank values.</p>
 
 <p>The ReqSketch, however, has relative rank error and the user can choose 
which end of the rank domain should have high accuracy.  Refer to the sketch 
documentation for more information.</p>
 
+<p><a id="data-independence"></a></p>
 <h2 id="quantiles-sketches-and-data-independence">Quantiles Sketches and Data 
Independence</h2>
 <p>A <em>sketch</em> is an implementation of a <em>streaming algorithm</em>. 
By definition, a sketch has only one chance to examine each item of the stream. 
 It is this property that makes a sketch a <em>streaming</em> algorithm and 
useful for real-time analysis of very large streams that may be impractical to 
actually store.</p>
 
@@ -375,7 +425,8 @@ between <em>getQuantile(0.49)</em> and 
<em>getQuantile(0.51)</em>. This same 1%
 
 <p>The only thing the user needs to know is how to extract the values from the 
stream so that they can be fed into the sketch. It is reasonable that the user 
knows the <em>type</em> of values in the stream: e.g., are they alphanumeric 
strings, numeric strings, or numeric primitives. These properties may determine 
the type of sketch to use as well as how to extract the appropriate quantities 
to feed into the sketch.</p>
 
-<h2 
id="accuracy-information-for-the-orgapachedatasketchesquantiles-sketch-package">Accuracy
 Information for the org.apache.datasketches.quantiles Sketch Package</h2>
+<p><a id="accuracy-table"></a></p>
+<h2 id="accuracy-table-for-the-classic-quantiles-sketch">Accuracy Table for 
the Classic Quantiles Sketch</h2>
 <p>A <i>k</i> of 256 produces a normalized rank error of less than 1%.
 For example, the median value returned from getQuantile(0.5) will be between 
the actual values
 from the hypothetically sorted array of input values at normalized ranks of 
0.49 and 0.51, with 
@@ -427,6 +478,9 @@ Table Guide for Quantiles DoublesSketch Size in Bytes and 
Approximate Error:
  4,294,967,295 |     3,744     7,200    13,856    26,656    51,232    98,336   
188,448   360,480   688,160 1,310,752 2,490,400 4,718,624
 </pre>
 
+<p><a id="accuracy-plots"></a></p>
+<h2 id="accuracy-plots">Accuracy Plots</h2>
+
 <p>The following graphs illustrate the ability of the Quantiles DoublesSketch 
to characterize value distributions.</p>
 
 <ul>
diff --git a/output/docs/Quantiles/QuantilesSketchOverview.html 
b/output/docs/Quantiles/QuantilesSketchOverview.html
index 0d26f0df..4d303ad9 100644
--- a/output/docs/Quantiles/QuantilesSketchOverview.html
+++ b/output/docs/Quantiles/QuantilesSketchOverview.html
@@ -352,7 +352,17 @@
     specific language governing permissions and limitations
     under the License.
 -->
-<h1 id="quantiles-sketch-overview">Quantiles Sketch Overview</h1>
+<h2 id="contents">Contents</h2>
+
+<ul>
+  <li><a href="#overview">Classic Quantiles Sketch Overview</a></li>
+  <li><a href="#numeric-quantiles">Numeric Quantiles</a></li>
+  <li><a href="#extending">Extending Generic Quantiles Classes</a></li>
+  <li><a href="#notes">Implementation Notes</a></li>
+</ul>
+
+<p><a id="overview"></a></p>
+<h1 id="classic-quantiles-sketch-overview">Classic Quantiles Sketch 
Overview</h1>
 
 <p>Package: org.apache.datasketches.quantiles</p>
 
@@ -365,14 +375,8 @@ Probability Mass Function, getPMF(), and the Cumulative 
Distribution Function, g
   <li><strong>NOTE:</strong> See also the <a 
href="/docs/KLL/KLLSketch.html">KLL Sketch</a>.</li>
 </ul>
 
-<h3 id="section-links">Section links:</h3>
-<ul>
-  <li><a href="#Section 1">Section 1</a> Numeric Quantiles</li>
-  <li><a href="#Section 2">Section 2</a> Extending Generic Quantiles 
Classes</li>
-  <li><a href="#Section 3">Section 3</a> Implementation Notes</li>
-</ul>
-
-<h2 id="numeric-quantiles"><a name="Section 1"></a>Numeric Quantiles</h2>
+<p><a id="numeric-quantiles"></a></p>
+<h2 id="numeric-quantiles">Numeric Quantiles</h2>
 
 <p>Consider this real data example of a stream of 230 million time-spent 
events collected from one our systems for a period of just 30 minutes. Each 
event records the amount of time in milliseconds that a user spends on a web 
page before moving to a different web page by taking some action, such as a 
click.</p>
 
@@ -517,7 +521,8 @@ System.out.println(qs3.toString()); //Primarily for 
debugging
 */
 </code></pre></div></div>
 
-<h2 id="extending-generic-quantiles-classes"><a name="Section 2"></a>Extending 
Generic Quantiles Classes</h2>
+<p><a id="extending"></a></p>
+<h2 id="extending-generic-quantiles-classes">Extending Generic Quantiles 
Classes</h2>
 
 <p>Any item type that is comparable, or for which you can create a Comparator, 
can also be analyzed by extending the abstract generic classes for that 
particular item.</p>
 
@@ -583,7 +588,8 @@ MyItem[] itemSplitPoints = 
sketch.getQuantiles(rankFractions);
 
 <p>Using a simple binary search you can now split your data into the 10 
partitions.</p>
 
-<h2 id="implementation-notes"><a name="Section 3"></a>Implementation Notes</h2>
+<p><a id="notes"></a></p>
+<h2 id="implementation-notes">Implementation Notes</h2>
 
 <p>The quantiles algorithm is an implementation of the Low Discrepancy 
Mergeable Quantiles Sketch, using double values, described in section 3.2 of 
the journal version of the paper “Mergeable Summaries” by Agarwal, Cormode, 
Huang, Phillips, Wei, and Yi. 
 <a href="http://dblp.org/rec/html/journals/tods/AgarwalCHPWY13";></a> <!-- does 
not work with https --></p>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datasketches-website) branch asf-site updated: Automatic Site Publish by Buildbot

Reply via email to