This is an automated email from the ASF dual-hosted git repository.

alsay pushed a commit to branch master
in repository 
https://gitbox.apache.org/repos/asf/incubator-datasketches-postgresql.git


The following commit(s) were added to refs/heads/master by this push:
     new 524521b  KLL and Frequent Strings merge examples
524521b is described below

commit 524521b8f3a61ade76454debcabbc89ab1a8532d
Author: AlexanderSaydakov <[email protected]>
AuthorDate: Tue Jul 2 17:32:57 2019 -0700

    KLL and Frequent Strings merge examples
---
 README.md | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 9ada639..5c70b5e 100644
--- a/README.md
+++ b/README.md
@@ -146,8 +146,9 @@ Non-aggregate union:
 <h2>Estimating quanitles, ranks and histograms with KLL sketch</h2>
 
 Table "normal" has 1 million values from the normal (Gaussian) distribution 
with mean=0 and stddev=1.
-We can build a sketch, which represents the distribution (create table 
kll\_float\_sketch\_test(sketch kll\_float\_sketch)):
+We can build a sketch, which represents the distribution:
 
+       create table kll_float_sketch_test(sketch kll_float_sketch);
        $ psql test -c "insert into kll_float_sketch_test select 
kll_float_sketch_build(value) from normal"
        INSERT 0 1
 
@@ -190,6 +191,15 @@ In this simple example we know the value of N since we 
constructed this sketch,
 
 Note that the normal distribution was used just to show the basic usage. The 
sketch does not make any assumptions about the distribution.
 
+Let's create two more sketches to show merging kll_float_sketch:
+
+       insert into kll_float_sketch_test select kll_float_sketch_build(value) 
from normal;
+       insert into kll_float_sketch_test select kll_float_sketch_build(value) 
from normal;
+       select kll_float_sketch_get_quantile(kll_float_sketch_merge(sketch), 
0.5) from kll_float_sketch_test;
+        kll_float_sketch_get_quantile
+       -------------------------------
+                           0.00332207
+
 <h2>Frequent strings</h2>
 
 Consider a numeric Zipfian distribution with parameter alpha=1.1 (high skew)
@@ -248,4 +258,26 @@ Here is an equivalent exact computation:
        real    0m18.362s
 
 In this particular case the exact computation happens to be faster. This is
-just to show the basic usage. Most importantly, the sketch can be used as an 
"additive" metric in a data cube, and can be easily merged across dimensions.
\ No newline at end of file
+just to show the basic usage. Most importantly, the sketch can be used as an 
"additive" metric in a data cube, and can be easily merged across dimensions.
+
+Merging frequent_strings_sketch:
+
+       create table frequent_strings_sketch_test(sketch 
frequent_strings_sketch);
+       insert into frequent_strings_sketch_test select 
frequent_strings_sketch_build(9, value) from zipf_1p1_8k_100m;
+       insert into frequent_strings_sketch_test select 
frequent_strings_sketch_build(9, value) from zipf_1p1_8k_100m;
+       insert into frequent_strings_sketch_test select 
frequent_strings_sketch_build(9, value) from zipf_1p1_8k_100m;
+       select 
frequent_strings_sketch_result_no_false_negatives(frequent_strings_sketch_merge(9,
 sketch), 3000000) from frequent_strings_sketch_test;
+        frequent_strings_sketch_result_no_false_negatives
+       ---------------------------------------------------
+        (1,45986859,45627006,45986859)
+        (2,21468195,21108342,21468195)
+        (3,13735083,13375230,13735083)
+        (4,10004424,9644571,10004424)
+        (5,7825689,7465836,7825689)
+        (6,6407145,6047292,6407145)
+        (7,5405883,5046030,5405883)
+        (8,4672299,4312446,4672299)
+        (9,4105338,3745485,4105338)
+        (10,3649596,3289743,3649596)
+        (11,3294912,2935059,3294912)
+       (11 rows)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to