(datasketches-website) 01/01: updated overview of quantile sketches

alsay Thu, 24 Oct 2024 15:58:11 -0700

This is an automated email from the ASF dual-hosted git repository.

alsay pushed a commit to branch update_quantiles_overview
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git


commit 83a5a3d82c41a9e684b5203193ed566d16827714
Author: AlexanderSaydakov <[email protected]>
AuthorDate: Thu Oct 24 15:57:27 2024 -0700

    updated overview of quantile sketches
---
 docs/QuantilesAll/QuantilesOverview.md | 113 +++++++++++++++++----------------
 1 file changed, 60 insertions(+), 53 deletions(-)

diff --git a/docs/QuantilesAll/QuantilesOverview.md 
b/docs/QuantilesAll/QuantilesOverview.md
index 68fc3682..cbbdcab4 100644
--- a/docs/QuantilesAll/QuantilesOverview.md
+++ b/docs/QuantilesAll/QuantilesOverview.md
@@ -21,7 +21,7 @@ layout: doc_page
 -->
 # Introduction to the Quantile Sketches
 
-This is a quick overview of the quantiles sketches in the library. Each of 
these quantile types may have one or more specific implementaions and some 
variation in API depending on the language. Three of the quantile sketches have 
mathematically provable error bounds while the fourth is an empirical algorithm.
+This is an overview of the quantiles sketches in the library. Each of these 
quantile types may have one or more specific implementaions and some variation 
in API depending on the language. Three of the quantile sketches have 
mathematically provable error bounds while the fourth is an empirical algorithm.
 
 The three sketches with mathematically provable error bounds are:
 
@@ -29,84 +29,91 @@ The three sketches with mathematically provable error 
bounds are:
 * The KLL quantile sketch family
 * The REQ quantile sketch
 
-The one empirical quantile sketch is the T-Digest sketch. 
+The one empirical quantile sketch is the T-Digest sketch.
 
-The mathematical error bounds of the Classic, KLL and REQ sketches are 
specified with respect to rank and not with respect to quantiles. 
+The mathematical error bounds of the Classic, KLL and REQ sketches are 
specified with respect to rank and not with respect to quantiles.
 
 The T-Digest is empirical and has no mathematical basis for estimating its 
error and its results are dependent on the input data. However, for many common 
data distributions, it can produce excellent results. Please refer to the 
spcific documentation about the T-Digest sketch.
 
 For the Classic and KLL sketches, the difference between the rank upper bound 
and the rank lower bound is a 99% confidence interval and is an additive 
constant for all normalized ranks between 0.0 and 1.0. The specific error is a 
function of the parameter <i>K</i> of the sketch and can be derived from the 
sketch.  For example, if the rank error for a given K is 1%, then the error of 
a result rank of .01 is +/- .01 with a 99% confidence; the error of a result 
rank of .99 is +/- .01 with a [...]
 
-The REQ sketch is special in that it's error is also relative to the actual 
result rank (thus its name: Relative Error Quantiles). It was designed to 
proved very high rank accuacy for either the high end of the range of ranks 
(close to 1.0) or, based on the user's choice, the low end of ranks (close to 
0.0). Please refer to the spcific documentation about the REQ sketch. 
+The REQ sketch is special in that its error is also relative to the actual 
result rank (thus its name: Relative Error Quantiles). It was designed to 
proved very high rank accuacy for either the high end of the range of ranks 
(close to 1.0) or, based on the user's choice, the low end of ranks (close to 
0.0). Please refer to the spcific documentation about the REQ sketch.
 
-Although upper and lower quantile bounds can be approximately computed from 
the upper and lower rank bounds, and the difference between the quantile bounds 
is also an approximate confidence interval, the size of the quantile confidence 
interval may not be meaningful and is not constrained by the defined rank error 
of the sketch.  
+Although upper and lower quantile bounds can be approximately computed from 
the upper and lower rank bounds, and the difference between the quantile bounds 
is also an approximate confidence interval, the size of the quantile confidence 
interval may not be meaningful and is not constrained by the defined rank error 
of the sketch.
 
-For example, a 1% rank error at a result rank of .50 corresponds to a rank 
range of .49 to .51. However, in the quantile domain, there could be step 
function in the values such that the difference in the quantiles corresponding 
to ranks .49 and .51 are as large as the the entire dynamic range of all 
quantiles. It is possible to retrieve the upper and lower quantile bounds at a 
specific rank, but the difference between these quantile bounds may not be 
meaningful. 
+For example, a 1% rank error at a result rank of .50 corresponds to a rank 
range of .49 to .51. However, in the quantile domain, there could be step 
function in the values such that the difference in the quantiles corresponding 
to ranks .49 and .51 are as large as the the entire dynamic range of all 
quantiles. It is possible to retrieve the upper and lower quantile bounds at a 
specific rank, but the difference between these quantile bounds may not be 
meaningful.
 
-These sketches have many parallel methods. Please refer to the sketch API 
documentation by language for more information. 
-
-### The Classic Quantiles Sketch
+These sketches have many parallel methods. Please refer to the sketch API 
documentation by language for more information.
 
+### Classic Quantiles Sketch
 * Java
        * Repo: <https://github.com/apache/datasketches-java>
-       * Package: org.apache.datasketches.quantiles
+       * Package: 
[org.apache.datasketches.quantiles](https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches/quantiles)
+       * Dedicated *double* and generic *item* implentations for arbitrary 
comparable types.
+               * The *double* implementation can be configured for off-heap 
operation.
 * C++
        * Repo: <https://github.com/apache/datasketches-cpp>
+       * Directory: 
[quantiles](https://github.com/apache/datasketches-cpp/tree/master/quantiles)
+       * Template implementation for arbitrary comparable types.
 * Python
-    * Repo: <https://github.com/apache/datasketches-python> 
-* Key Features (both Java & C++)
-    * User selectable search criteria QuantileSearchCriteria: 
-               * Exclusive, which is compatible with the KLL and older 
Quantiles Sketch
-               * Inclusive, a common definition in some of the theoretical 
literature. 
-       * Accuracy %: a function of *K* and independent of rank. 
-       * Dedicated *double* and generic *item* implentations for arbitrary 
comparable types. 
-       * The *double* implementation can be configured for off-heap operation.
-
-
-### The KLL Sketch
-
+    * Repo: <https://github.com/apache/datasketches-python>
+       * Dedicated *float*, *double*, *integer* and arbitrary Python object 
implementations
+* Key Features (Java, C++ and Python)
+       * Accuracy %: a function of *K* and independent of rank.
+    * User selectable search criteria QuantileSearchCriteria:
+               * Inclusive, a common definition in some of the theoretical 
literature (default).
+               * Exclusive, which is compatible with older Quantiles Sketch.
+
+### KLL Sketch
 * Java
-       * Repo: <https://github.com/apache/datasketches-java>:
-       * Package: org.apache.datasketches.kll
+       * Repo: <https://github.com/apache/datasketches-java>
+       * Package: 
[org.apache.datasketches.kll](https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches/kll)
+       * Dedicated *float*, *double*, *long*, and generic *item* 
implementations.
+               * The *float*, *double*, and *long* implementations can be 
configured for off-heap operation.
 * C++
        * Repo: <https://github.com/apache/datasketches-cpp>
-       * Directory: kll
+       * Directory: 
[kll](https://github.com/apache/datasketches-cpp/tree/master/kll)
+       * Template implementation for arbitrary comparable types.
 * Python
-    * Repo: <https://github.com/apache/datasketches-python>  
-* Key Features (both Java & C++)
+    * Repo: <https://github.com/apache/datasketches-python>
+       * Dedicated *float*, *double*, *integer* and arbitrary Python object 
implementations
+* Key Features (Java, C++ and Python)
+       * Accuracy %: a function of *K* and independent of rank.
        * User selectable comparison QuantileSearchCriteria: 
-               * Exclusive, which is compatible with the KLL and older 
Quantiles Sketch
-               * Inclusive, a common definition in some of the theoretical 
literature.
-       * Accuracy %: a function of *K* and independent of rank. 
-       * Near optimal accuracy per sketch size compared to other constant 
accuracy quantiles sketches. 
-       * Java: Dedicated *float*, *double*, *long*, and generic *item* 
implementations.
-       * The *float*, *double*, and *long* implementations can be configured 
for off-heap operation.
-       * C++: Template implementation for arbitrary comparible types.
-       * Python: Dedicated *float* and *integer* implementations
-
-### The REQ Sketch
+               * Inclusive, a common definition in some of the theoretical 
literature (default).
+               * Exclusive, which is compatible with older Quantiles Sketch.
+       * Near optimal accuracy per sketch size compared to other constant 
accuracy quantiles sketches.
 
+### REQ Sketch
 * Java 
        * Repo: <https://github.com/apache/datasketches-java>
-       * Package: org.apache.datasketches.req
+       * Package: 
[org.apache.datasketches.req](https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches/req)
+       * Dedicated *float* implementation.
 * C++
        * Repo: <https://github.com/apache/datasketches-cpp>
-       * Directory: req
+       * Directory: 
[req](https://github.com/apache/datasketches-cpp/tree/master/req)
+       * Template implementation for arbitrary comparable types.
 * Python
     * Repo: <https://github.com/apache/datasketches-python>
-
-* Key Features (both Java & C++)
+       * Dedicated *float*, *integer* and arbitrary Python object 
implementations
+* Key Features (Java, C++ and Python)
        * Accuracy %: a function of *K* and **relative** with respect to rank. 
The user can select either High Rank Accuracy (HRA) or Low Rank Accuracy (LRA). 
This enables extremely high accuracy for the ends of the rank domain. E.g., 
99.999%ile quantiles.
-       * User selectable comparison QuantileSearchCriteria: 
-               * Exclusive, which is compatible with the KLL and older 
Quantiles Sketch
-               * Inclusive, a common definition in some of the theoretical 
literature.
-       * Java: Dedicated *float* implementation.
-       * C++: Template implementation for arbitrary comparible types.
-       * Python: Dedicated *float* and *integer* implementations.
-
-       
-
-
-
-
+       * User selectable comparison QuantileSearchCriteria:
+               * Inclusive, a common definition in some of the theoretical 
literature (default).
+               * Exclusive, which is compatible with older Quantiles Sketch.
 
+### T-Digest
+* Java 
+       * Repo: <https://github.com/apache/datasketches-java>
+       * Package: 
[org.apache.datasketches.req](https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches/tdigest)
+       * Dedicated *double* implementation.
+* C++
+       * Repo: <https://github.com/apache/datasketches-cpp>
+       * Directory: 
[req](https://github.com/apache/datasketches-cpp/tree/master/tdigest)
+       * Template implementation for *float* and *double* types.
+* Python
+    * Repo: <https://github.com/apache/datasketches-python>
+       * Dedicated *float* and *double* implementations
+* Key Features (Java, C++ and Python)
+       * Works on numeric (floating point) types only.
+       * Accuracy: a function of *K* and **relative** with respect to rank. 
Prioritizes both high rank accuracy and low rank accuracy with lower accuracy 
in the middle.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datasketches-website) 01/01: updated overview of quantile sketches

Reply via email to