This is an automated email from the ASF dual-hosted git repository.
leerho pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git
The following commit(s) were added to refs/heads/master by this push:
new 3ebabf80 Added more detail to Sketching Quantiles and Ranks Tutorial.
3ebabf80 is described below
commit 3ebabf80577c2c5f830f5540c1140c3b8c55270a
Author: Lee Rhodes <[email protected]>
AuthorDate: Tue Sep 6 16:50:01 2022 -0700
Added more detail to Sketching Quantiles and Ranks Tutorial.
---
.../SketchingQuantilesAndRanksTutorial.md | 244 +++++++++++++++++----
1 file changed, 202 insertions(+), 42 deletions(-)
diff --git a/docs/Quantiles/SketchingQuantilesAndRanksTutorial.md
b/docs/Quantiles/SketchingQuantilesAndRanksTutorial.md
index b9f54ac8..d867e892 100644
--- a/docs/Quantiles/SketchingQuantilesAndRanksTutorial.md
+++ b/docs/Quantiles/SketchingQuantilesAndRanksTutorial.md
@@ -128,77 +128,183 @@ One can find examples of the following definitions in
the research literature.
These next examples use a small data set that mimics what could be the result
of both duplication and sketch data deletion.
+## The rules for returned quantiles or ranks
+
+* **Rule 1:** Every Quantile that exists in the input stream or retained by
the sketch has an associated Rank.
+
+* **Rule 2:** All of our quantile sketches only retain quantiles that exist in
the actual input stream of quantiles.
+
+* **Rule 3:** For the *getQuantile(rank)* queries, all of our quantile
sketches only return quantiles that were retained by the sketch. (i.e, we do
not interpolate between quantiles.)
+
+* **Rule 4:** For the *getRank(quantile)* queries, all of our quantile
sketches only return ranks that are associated with quantiles retained by the
sketch. (i.e, we do not interpolate between ranks.)
+
+* **Rule 5:** All of our quantile algorithms compensate for quantiles removed
during the sketch quantile selection and compression process by increasing the
weights of some of the quantiles not selected for removal, such that:
+
+ * The sum of the natural weights of all quantiles retained by the sketch
equals **N**, the total count of all quantiles given to the sketch.
+ * And by corollary, the largest quantile, when sorted by cumulative rank,
has a cumulative natural rank of **N**, or equivalently, a cumulative
normalized rank of **1.0**.
+
+
## The rank functions with inequalities
+### ***rank(quantile, INCLUSIVE)*** or ***r(q, LE)*** :=<br>Given *q*, return
the rank, *r*, of the largest quantile that is less than or equal to *q*.
+
+<b>Implementation:</b>
+
+* Given *q*, search the quantile array until we find the adjacent pair *{q1,
q2}* where *q1 <= q < q2*.
+* Return the rank, *r*, associated with *q1*, the first of the pair.
+
+<b>Boundary Exceptions:</b>
+
+* **Boundary Rule 1:** If the given *q* is *>=* the quantile associated with
the largest cumulative rank retained by the sketch, the function will return
the largest cumulative rank, *1.0*.
+* **Boundary Rule 2:** If the given *q* is *<* the quantile associated with
the smallest cumulative rank retained by the sketch, the function will return a
rank of *0.0*.
+
+#### Examples using normalized ranks:</b>
+
+* *r(30) = .786* Normal rule applies: *30 <= 30 < 40*, return *r(q1) = .786*.
+
+| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50 |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ----- |
+| Natural Rank[]: | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 14 |
+| Normalized Rank[]: | .071 | .214 | .357 | .500 | .643 | .786 | .929 | 1.0 |
+| Quantile input | | | | | | 30 | | |
+| Qualifying pair | | | | | | q1 | q2 | |
+| Rank result | | | | | | .786 | | |
+
+
+* *r(55) = 1.0* Use Boundary Rule 1: *50 <= 55*, return *1.0*.
+
+| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50
| ? |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | -----
| ----- |
+| Natural Rank[]: | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 14
| |
+| Normalized Rank[]: | .071 | .214 | .357 | .500 | .643 | .786 | .929 | 1.0
| |
+| Quantile input | | | | | | | | 55
| |
+| Qualifying pair | | | | | | | | q1
| (q2) |
+| Rank result | | | | | | | | 1.0
| |
+
+
+* *r(5) = 0.0* Use Boundary Rule 2: *5 < 10*, return *0.0*.
+
+| Quantile[]: | ? | 10 | 20 | 20 | 30 | 30 | 30 | 40
| 50 |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | -----
| ----- |
+| Natural Rank[]: | | 1 | 3 | 5 | 7 | 9 | 11 | 13
| 14 |
+| Normalized Rank[]: | | .071 | .214 | .357 | .500 | .643 | .786 | .929
| 1.0 |
+| Quantile input | 5 | | | | | | |
| |
+| Qualifying pair | (q1) | q2 | | | | | |
| |
+| Rank result | 0 | | | | | | |
| |
+
+
+--------
+
### ***rank(quantile, EXCLUSIVE)*** or ***r(q, LT)*** :=<br>Given *q*, return
the rank, *r*, of the largest quantile that is strictly *Less Than* *q*.
<b>Implementation:</b>
-Given *q*, search the quantile array until we find the adjacent pair *{q1,
q2}* where *q1 < q <= q2*. Return the rank, *r*, associated with *q1*, the
first of the pair.
-<b>Boundary Notes:</b>
+* Given *q*, search the quantile array until we find the adjacent pair *{q1,
q2}* where *q1 < q <= q2*.
+* Return the rank, *r*, associated with *q1*, the first of the pair.
+
+<b>Boundary Exceptions:</b>
-* If the given *q* is larger than the largest quantile retained by the sketch,
the sketch will return the rank of the largest retained quantile.
-* If the given *q* is smaller than the smallest quantile retained by the
sketch, the sketch will return a rank of zero.
+* **Boundary Rule 1:** If the given *q* is *>* the quantile associated with
the largest cumulative rank retained by the sketch, the sketch will return the
the largest cumulative rank, *1.0*.
+* **Boundary Rule 2:** If the given *q* is *<=* the quantile associated with
the smallest cumulative rank retained by the sketch, the sketch will return a
rank of *0.0*.
<b>Examples using normalized ranks:</b>
-* *r(55) = 1.0*
-* *r(5) = 0.0*
-* *r(30) = .357* (Illustrated in table)
+* *r(30) = .357* Normal rule applies: *20 < 30 <= 30*, return *r(q1) = .357*.
| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50 |
| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ----- |
| Natural Rank[]: | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 14 |
| Normalized Rank[]: | .071 | .214 | .357 | .500 | .643 | .786 | .929 | 1.000 |
-| Quantile input | | | | 30 | 30 | 30 | | |
+| Quantile input | | | | 30 | | | | |
| Qualifying pair | | | q1 | q2 | | | | |
| Rank result | | | .357 | | | | | |
---------
+* *r(55) = 1.0* Use Boundary Rule 1: *50 < 55*, return *1.0*.
-### ***rank(quantile, INCLUSIVE)*** or ***r(q, LE)*** :=<br>Given *q*, return
the rank, *r*, of the largest quantile that is less than or equal to *q*.
+| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50
| ? |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | -----
| ----- |
+| Natural Rank[]: | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 14
| |
+| Normalized Rank[]: | .071 | .214 | .357 | .500 | .643 | .786 | .929 | 1.0
| |
+| Quantile input | | | | | | | |
| 55 |
+| Qualifying pair | | | | | | | | q1
| (q2) |
+| Rank result | | | | | | | | 1.000
| |
+
+* *r(5) = 0.0* Use Boundary Rule 2: *5 <= 10*, return *0*.
+
+| Quantile[]: | ? | 10 | 20 | 20 | 30 | 30 | 30 | 40
| 50 |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | -----
| ----- |
+| Natural Rank[]: | | 1 | 3 | 5 | 7 | 9 | 11 | 13
| 14 |
+| Normalized Rank[]: | | .071 | .214 | .357 | .500 | .643 | .786 | .929
| 1.0 |
+| Quantile input | 5 | | | | | | |
| |
+| Qualifying pair | (q1) | q2 | | | | | |
| |
+| Rank result | 0 | | | | | | |
| |
+
+
+
+## The quantile functions with inequalities
+
+### ***quantile(rank, INCLUSIVE)*** or ***q(r, GE)*** :=<br>Given *r*, return
the quantile, *q*, of the smallest rank that is strictly Greater than or Equal
to *r*.
<b>Implementation:</b>
-Given *q*, search the quantile array until we find the adjacent pair *{q1,
q2}* where *q1 <= q < q2*. Return the rank, *r*, associated with *q1*, the
first of the pair.
-<b>Boundary Notes:</b>
+* Given *r*, search the rank array until we find the adjacent pair *{r1, r2}*
where *r1 < r <= r2*.
+* Return the quantile, *q*, associated with *r2*, the second of the pair.
-* If the given *q* is larger than the largest quantile retained by the sketch,
the function will return the rank of the largest retained quantile.
-* If the given *q* is smaller than the smallest quantile retained by the
sketch, the function will return a rank of zero.
+<b>Boundary Exceptions:</b>
+
+* **Boundary Rule 2:** If the given normalized rank, *r*, is *<=* the smallest
rank, the function will return the **quantile** associated with the smallest
cumulative rank.
<b>Examples using normalized ranks:</b>
-* *r(55) = 1.0*
-* *r(5) = 0.0*
-* *r(30) = .786* (Illustrated in table)
+* *q(.786) = 30* Normal rule applies: *.643 < .786 <= .786*, return *q(r2) =
30*.
| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50 |
| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ----- |
| Natural Rank[]: | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 14 |
| Normalized Rank[]: | .071 | .214 | .357 | .500 | .643 | .786 | .929 | 1.000 |
-| Quantile input | | | | 30 | 30 | 30 | | |
-| Qualifying pair | | | | | | q1 | q2 | |
-| Rank result | | | | | | .786 | | |
+| Rank input | | | | | | .786 | | |
+| Qualifying pair | | | | | r1 | r2 | | |
+| Quantile result | | | | | | 30 | | |
+* *q(1.0) = 50* Normal rule applies: *.929 < 1.0 <= 1.0*, return *q(r2) = 50*.
-## The quantile functions with inequalities
+| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50 |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ----- |
+| Natural Rank[]: | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 14 |
+| Normalized Rank[]: | .071 | .214 | .357 | .500 | .643 | .786 | .929 | 1.0 |
+| Rank input | | | | | | | | 1.0 |
+| Qualifying pair | | | | | | | r1 | r2 |
+| Quantile result | | | | | | | | 50 |
+
+* *q(0.0 <= .071) = 10* Use Boundary Rule 2: *0.0 <= .071*, return *10*.
+
+| Quantile[]: | ? | 10 | 20 | 20 | 30 | 30 | 30 | 40
| 50 |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | -----
| ----- |
+| Natural Rank[]: | | 1 | 3 | 5 | 7 | 9 | 11 | 13
| 14 |
+| Normalized Rank[]: | | .071 | .214 | .357 | .500 | .643 | .786 | .929
| 1.0 |
+| Rank input | 0.0 | | | | | | |
| |
+| Qualifying pair | (r1) | r2 | | | | | |
| |
+| Rank result | | 10 | | | | | |
| |
+
+
+--------
### ***quantile(rank, EXCLUSIVE)*** or ***q(r, GT)*** :=<br>Given *r*, return
the quantile, *q*, of the smallest rank that is strictly Greater Than *r*.
<b>Implementation:</b>
-Given *r*, search the rank array until we find the adjacent pair *{r1, r2}*
where *r1 <= r < r2*. Return the quantile associated with *r2*, the second of
the pair.
-<b>Boundary Notes:</b>
+* Given *r*, search the rank array until we find the adjacent pair *{r1, r2}*
where *r1 <= r < r2*.
+* Return the quantile, *q*, associated with *r2*, the second of the pair.
+
+<b>Boundary Exceptions:</b>
-* If the given normalized rank, *r*, is equal to 1.0, there is no quantile
that satisfies this criterion. However, for convenience, the function will
return the largest quantile retained by the sketch.
-* If the given normalized rank, *r*, is less than the smallest rank, the
function will return the smallest quantile.
+* **Boundary Rule 1:** If the given normalized rank, *r*, is equal to 1.0,
there is no quantile that satisfies this criterion. However, for convenience,
the function will return quantile associated with the largest cumulative rank
retained by the sketch.
+* **Boundary Rule 2:** If the given normalized rank, *r*, is less than the
smallest rank, the function will return the quantile associated with the
smallest cumulative rank retained by the sketch.
<b>Examples using normalized ranks:</b>
-* *q(1.0) = 50*
-* *q(0.0) = 10*
-* *q(.357) = 30* (Illustrated in table)
+* *q(.357) = 30* Normal rule applies: *.357 <= .357 < .500*, return *q(r2) =
30*.
| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50 |
| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ----- |
@@ -208,40 +314,94 @@ Given *r*, search the rank array until we find the
adjacent pair *{r1, r2}* wher
| Qualifying pair | | | r1 | r2 | | | | |
| Quantile result | | | | 30 | | | | |
---------
+* *q(1.0) = 50* Use Boundary Rule 1 *1.0 <= 1.0 < ?*, return *50*.
-### ***quantile(rank, EXCLUSIVE_STRICT)*** or ***q(r, GT_STRICT)***
:=<br>Given *r*, return the quantile, *q*, of the smallest rank that is
strictly Greater Than *r*.
+| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50
| ? |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | -----
| ----- |
+| Natural Rank[]: | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 14
| |
+| Normalized Rank[]: | .071 | .214 | .357 | .500 | .643 | .786 | .929 | 1.0
| |
+| Rank input | | | | | | | | 1.0
| |
+| Qualifying pair | | | | | | | | r1
| (r2) |
+| Quantile result | | | | | | | |
| 50 |
+
+* *q(0.99) = 50* Normal rule applies *.929 <= .99 < 1.0*, return *50*.
+
+| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50 |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ----- |
+| Natural Rank[]: | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 14 |
+| Normalized Rank[]: | .071 | .214 | .357 | .500 | .643 | .786 | .929 | 1.0 |
+| Rank input | | | | | | | .99 | |
+| Qualifying pair | | | | | | | r1 | r2 |
+| Quantile result | | | | | | | | 50 |
-In <b>STRICT</b> mode, the only difference is the following:
-<b>Boundary Notes:</b>
+* *q(0.0 <= .071) = 10* Use Boundary Rule 2: *0.0 < .071*, return *10*.
-* If the given normalized rank, *r*, is equal to 1.0, there is no quantile
that satisfies this criterion. The function will return *NaN*.
+| Quantile[]: | ? | 10 | 20 | 20 | 30 | 30 | 30 | 40
| 50 |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | -----
| ----- |
+| Natural Rank[]: | | 1 | 3 | 5 | 7 | 9 | 11 | 13
| 14 |
+| Normalized Rank[]: | | .071 | .214 | .357 | .500 | .643 | .786 | .929
| 1.0 |
+| Rank input | 0.0 | | | | | | |
| |
+| Qualifying pair | (r1) | r2 | | | | | |
| |
+| Rank result | | 10 | | | | | |
| |
--------
-### ***quantile(rank, INCLUSIVE)*** or ***q(r, GE)*** :=<br>Given *r*, return
the quantile, *q*, of the smallest rank that is strictly Greater than or Equal
to *r*.
+### ***quantile(rank, EXCLUSIVE_STRICT)*** or ***q(r, GT_STRICT)***
:=<br>Given *r*, return the quantile, *q*, of the smallest rank that is
strictly Greater Than *r*.
<b>Implementation:</b>
-Given *r*, search the rank array until we find the adjacent pair *{r1, r2}*
where *r1 < r <= r2*. Return the quantile, *q*, associated with *r2*, the
second of the pair.
-<b>Boundary Notes:</b>
+* Given *r*, search the rank array until we find the adjacent pair *{r1, r2}*
where *r1 <= r < r2*.
+* Return the quantile, *q*, associated with *r2*, the second of the pair.
+
+<b>Boundary Exceptions:</b>
-* If the given normalized rank, *r*, is equal to 1.0, the function will return
the largest quantile retained by the sketch.
-* If the given normalized rank, *r*, is less than the smallest rank, the
function will return the smallest quantile.
+* **Boundary Rule 1:** If the given normalized rank, *r*, is equal to *1.0*,
there is no quantile that satisfies this criterion. Return *NaN* or *null*.
+* **Boundary Rule 2:** If the given normalized rank, *r*, is less than the
smallest rank, the function will return the quantile associated with the
smallest cumulative rank retained by the sketch..
<b>Examples using normalized ranks:</b>
-For example *q(.786) = 30*
+* *q(.357) = 30* Normal rule applies: *.357 <= .357 < .500*, return *q(r2) =
30*.
| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50 |
| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ----- |
| Natural Rank[]: | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 14 |
| Normalized Rank[]: | .071 | .214 | .357 | .500 | .643 | .786 | .929 | 1.000 |
-| Rank input | | | | | | .786 | | |
-| Qualifying pair | | | | | r1 | r2 | | |
-| Quantile result | | | | | | 30 | | |
+| Rank input | | | .357 | | | | | |
+| Qualifying pair | | | r1 | r2 | | | | |
+| Quantile result | | | | 30 | | | | |
+
+* *q(1.0) = 50* Use Boundary Rule 1 *1.0 <= 1.0 < ?*, return *NaN or null*.
+
+| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50
| ? |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | -----
| ----- |
+| Natural Rank[]: | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 14
| |
+| Normalized Rank[]: | .071 | .214 | .357 | .500 | .643 | .786 | .929 | 1.0
| |
+| Rank input | | | | | | | | 1.0
| |
+| Qualifying pair | | | | | | | | r1
| (r2) |
+| Quantile result | | | | | | | |
| NaN or null |
+
+* *q(0.99) = 50* Normal rule applies *.929 <= .99 < 1.0*, return *50*.
+
+| Quantile[]: | 10 | 20 | 20 | 30 | 30 | 30 | 40 | 50 |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ----- |
+| Natural Rank[]: | 1 | 3 | 5 | 7 | 9 | 11 | 13 | 14 |
+| Normalized Rank[]: | .071 | .214 | .357 | .500 | .643 | .786 | .929 | 1.0 |
+| Rank input | | | | | | | .99 | |
+| Qualifying pair | | | | | | | r1 | r2 |
+| Quantile result | | | | | | | | 50 |
+
+* *q(0.0 <= .071) = 10* Use Boundary Rule 2: *0.0 < .071*, return *10*.
+
+| Quantile[]: | ? | 10 | 20 | 20 | 30 | 30 | 30 | 40
| 50 |
+| ------------------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | -----
| ----- |
+| Natural Rank[]: | | 1 | 3 | 5 | 7 | 9 | 11 | 13
| 14 |
+| Normalized Rank[]: | | .071 | .214 | .357 | .500 | .643 | .786 | .929
| 1.0 |
+| Rank input | 0.0 | | | | | | |
| |
+| Qualifying pair | (r1) | r2 | | | | | |
| |
+| Rank result | | 10 | | | | | |
| |
+
## These inequality functions maintain the 1:1 functional relationship
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]