This is an automated email from the ASF dual-hosted git repository. alsay pushed a commit to branch one_link in repository https://gitbox.apache.org/repos/asf/datasketches-bigquery.git
commit 7af47d087f20c079d481f7bc8e0952d7db343b10 Author: AlexanderSaydakov <[email protected]> AuthorDate: Tue Oct 29 16:32:14 2024 -0700 one link per sketch type --- README.md | 234 ++++++++++++++++++++++++++++------------------------ README_template.md | 18 +++- readme_generator.py | 1 + 3 files changed, 141 insertions(+), 112 deletions(-) diff --git a/README.md b/README.md index 0bc141c..fd19898 100644 --- a/README.md +++ b/README.md @@ -81,22 +81,24 @@ accurate estimates with low memory usage and are particularly useful for applications like counting unique users, analyzing website traffic, or tracking distinct events. +For more information: [CPC Sketches](https://datasketches.apache.org/docs/CPC/CpcSketches.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| -| [cpc_sketch_agg_union](cpc/sqlx/cpc_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/CPC/CpcSketches.html | -| [cpc_sketch_agg_string](cpc/sqlx/cpc_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES <br><br>For more information:<br> \- https://datasketches.apache.org/docs/CPC/CpcSketches.html | -| [cpc_sketch_agg_int64](cpc/sqlx/cpc_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES <br><br>For more information:<br> \- https://datasketches.apache.org/docs/CPC/CpcSketches.html | -| [cpc_sketch_agg_string_lgk_seed](cpc/sqlx/cpc_sketch_agg_string_lgk_seed.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compresse [...] -| [cpc_sketch_agg_union_lgk_seed](cpc/sqlx/cpc_sketch_agg_union_lgk_seed.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the corre [...] -| [cpc_sketch_agg_int64_lgk_seed](cpc/sqlx/cpc_sketch_agg_int64_lgk_seed.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compressed [...] -| [cpc_sketch_get_estimate](cpc/sqlx/cpc_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: a FLOAT64 value as the cardinality estimate.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/CPC/CpcSketches.html | -| [cpc_sketch_to_string](cpc/sqlx/cpc_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/CPC/CpcSketches.html | -| [cpc_sketch_get_estimate_seed](cpc/sqlx/cpc_sketch_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a FLOAT64 value as the cardinality estimate.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/CPC/CpcSketches.html | -| [cpc_sketch_to_string_seed](cpc/sqlx/cpc_sketch_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/CPC/CpcSketc [...] -| [cpc_sketch_union](cpc/sqlx/cpc_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a CPC Sketch, as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/CPC/CpcSketches.html | -| [cpc_sketch_get_estimate_and_bounds](cpc/sqlx/cpc_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the re [...] -| [cpc_sketch_union_lgk_seed](cpc/sqlx/cpc_sketch_union_lgk_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured w [...] -| [cpc_sketch_get_estimate_and_bounds_seed](cpc/sqlx/cpc_sketch_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard devia [...] +| [cpc_sketch_agg_union](cpc/sqlx/cpc_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES. | +| [cpc_sketch_agg_string](cpc/sqlx/cpc_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES | +| [cpc_sketch_agg_int64](cpc/sqlx/cpc_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES | +| [cpc_sketch_agg_string_lgk_seed](cpc/sqlx/cpc_sketch_agg_string_lgk_seed.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compresse [...] +| [cpc_sketch_agg_union_lgk_seed](cpc/sqlx/cpc_sketch_agg_union_lgk_seed.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the corre [...] +| [cpc_sketch_agg_int64_lgk_seed](cpc/sqlx/cpc_sketch_agg_int64_lgk_seed.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compressed [...] +| [cpc_sketch_get_estimate](cpc/sqlx/cpc_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: a FLOAT64 value as the cardinality estimate. | +| [cpc_sketch_to_string](cpc/sqlx/cpc_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch. | +| [cpc_sketch_get_estimate_seed](cpc/sqlx/cpc_sketch_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a FLOAT64 value as the cardinality estimate. | +| [cpc_sketch_to_string_seed](cpc/sqlx/cpc_sketch_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch. | +| [cpc_sketch_union](cpc/sqlx/cpc_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a CPC Sketch, as BYTES. | +| [cpc_sketch_get_estimate_and_bounds](cpc/sqlx/cpc_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the re [...] +| [cpc_sketch_union_lgk_seed](cpc/sqlx/cpc_sketch_union_lgk_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured w [...] +| [cpc_sketch_get_estimate_and_bounds_seed](cpc/sqlx/cpc_sketch_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard devia [...] **Examples:** @@ -178,12 +180,14 @@ frequencies of items in a dataset. They are effective for identifying the most frequent items, such as the top products purchased or the most popular search queries. +For more information: [Frequency Sketches](https://datasketches.apache.org/docs/Frequency/FrequencySketches.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| -| [frequent_strings_sketch_merge](fi/sqlx/frequent_strings_sketch_merge.sqlx) | AGGREGATE | (sketch BYTES, lg_max_map_size BYTEINT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as an integer not less than 3.<br>Returns: a serialized Frequent Strings sketch as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Frequency/FrequencySketche [...] -| [frequent_strings_sketch_build](fi/sqlx/frequent_strings_sketch_build.sqlx) | AGGREGATE | (item STRING, weight INT64, lg_max_map_size BYTEINT NOT AGGREGATE) -> BYTES | Creates a sketch that represents frequencies of the given column.<br><br>Param item: the column of STRING values.<br>Param weight: the amount by which the weight of the item should be increased.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as a BYTEINT not less than 3.<br>Returns: a Frequent Strings Sk [...] -| [frequent_strings_sketch_to_string](fi/sqlx/frequent_strings_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Frequency/FrequencySketches.html | -| [frequent_strings_sketch_get_result](fi/sqlx/frequent_strings_sketch_get_result.sqlx) | SCALAR | (sketch BYTES, error_type STRING, threshold INT64) -> ARRAY<STRUCT<item STRING, estimate INT64, lower_bound INT64, upper_bound INT64>> | Returns an array of rows that include frequent items, estimates, lower and upper bounds<br>given an error\_type and a threshold.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Param error\_type: determines whether no false positives or n [...] +| [frequent_strings_sketch_merge](fi/sqlx/frequent_strings_sketch_merge.sqlx) | AGGREGATE | (sketch BYTES, lg_max_map_size BYTEINT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as an integer not less than 3.<br>Returns: a serialized Frequent Strings sketch as BYTES. | +| [frequent_strings_sketch_build](fi/sqlx/frequent_strings_sketch_build.sqlx) | AGGREGATE | (item STRING, weight INT64, lg_max_map_size BYTEINT NOT AGGREGATE) -> BYTES | Creates a sketch that represents frequencies of the given column.<br><br>Param item: the column of STRING values.<br>Param weight: the amount by which the weight of the item should be increased.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as a BYTEINT not less than 3.<br>Returns: a Frequent Strings Sk [...] +| [frequent_strings_sketch_to_string](fi/sqlx/frequent_strings_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | +| [frequent_strings_sketch_get_result](fi/sqlx/frequent_strings_sketch_get_result.sqlx) | SCALAR | (sketch BYTES, error_type STRING, threshold INT64) -> ARRAY<STRUCT<item STRING, estimate INT64, lower_bound INT64, upper_bound INT64>> | Returns an array of rows that include frequent items, estimates, lower and upper bounds<br>given an error\_type and a threshold.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Param error\_type: determines whether no false positives or n [...] **Examples:** @@ -211,19 +215,21 @@ drop table `$BQ_DATASET`.fs_sketch; estimation sketch. They are known for their high accuracy and low memory consumption, making them suitable for large datasets and real-time analytics. +For more information: [HLL Sketches](https://datasketches.apache.org/docs/HLL/HllSketches.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| -| [hll_sketch_agg_string](hll/sqlx/hll_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/HLL/HllSketches.html | -| [hll_sketch_agg_union](hll/sqlx/hll_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/HLL/HllSketches.html | -| [hll_sketch_agg_int64](hll/sqlx/hll_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/HLL/HllSketches.html | -| [hll_sketch_agg_string_lgk_type](hll/sqlx/hll_sketch_agg_string_lgk_type.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: an [...] -| [hll_sketch_agg_union_lgk_type](hll/sqlx/hll_sketch_agg_union_lgk_type.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Retur [...] -| [hll_sketch_agg_int64_lgk_type](hll/sqlx/hll_sketch_agg_int64_lgk_type.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: an H [...] -| [hll_sketch_get_estimate](hll/sqlx/hll_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: the cardinality estimate as FLOAT64 value.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/HLL/HllSketches.html | -| [hll_sketch_to_string](hll/sqlx/hll_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: a STRING that represents the state of the given sketch.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/HLL/HllSketches.html | -| [hll_sketch_union](hll/sqlx/hll_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/HLL/HllSketches.html | -| [hll_sketch_union_lgk_type](hll/sqlx/hll_sketch_union_lgk_type.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, tgt_type STRING) -> BYTES | Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br [...] -| [hll_sketch_get_estimate_and_bounds](hll/sqlx/hll_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the retu [...] +| [hll_sketch_agg_string](hll/sqlx/hll_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | +| [hll_sketch_agg_union](hll/sqlx/hll_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | +| [hll_sketch_agg_int64](hll/sqlx/hll_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | +| [hll_sketch_agg_string_lgk_type](hll/sqlx/hll_sketch_agg_string_lgk_type.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: an [...] +| [hll_sketch_agg_union_lgk_type](hll/sqlx/hll_sketch_agg_union_lgk_type.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Retur [...] +| [hll_sketch_agg_int64_lgk_type](hll/sqlx/hll_sketch_agg_int64_lgk_type.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: an H [...] +| [hll_sketch_get_estimate](hll/sqlx/hll_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: the cardinality estimate as FLOAT64 value. | +| [hll_sketch_to_string](hll/sqlx/hll_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: a STRING that represents the state of the given sketch. | +| [hll_sketch_union](hll/sqlx/hll_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | +| [hll_sketch_union_lgk_type](hll/sqlx/hll_sketch_union_lgk_type.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, tgt_type STRING) -> BYTES | Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br [...] +| [hll_sketch_get_estimate_and_bounds](hll/sqlx/hll_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the retu [...] **Examples:** @@ -294,23 +300,25 @@ drop table `$BQ_DATASET`.hll_sketch; quantiles for a dataset. They are useful for understanding the distribution of data and calculating percentiles, such as the median or 95th percentile. +For more information: [KLL Sketches](https://datasketches.apache.org/docs/KLL/KLLSketch.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| -| [kll_sketch_float_build](kll/sqlx/kll_sketch_float_build.sqlx) | AGGREGATE | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a KLL Sketch, as bytes.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/KLL/KLLSketch.html | -| [kll_sketch_float_merge](kll/sqlx/kll_sketch_float_merge.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaluts: k = 200.<br>Returns: a serialized KLL sketch as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/KLL/KLLSketch.html | -| [kll_sketch_float_merge_k](kll/sqlx/kll_sketch_float_merge_k.sqlx) | AGGREGATE | (sketch BYTES, k INT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[8, 65535\].<br>Returns: a serialized KLL sketch as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/KLL/KLLSketch.html | -| [kll_sketch_float_build_k](kll/sqlx/kll_sketch_float_build_k.sqlx) | AGGREGATE | (value FLOAT64, k INT NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[8, 65535\].<br>Returns: a KLL Sketch, as bytes.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/KLL/KLLSketch.html | -| [kll_sketch_float_get_n](kll/sqlx/kll_sketch_float_get_n.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the length of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: stream length as INT64<br><br>For more information:<br> \- https://datasketches.apache.org/docs/KLL/KLLSketch.html | -| [kll_sketch_float_get_min_value](kll/sqlx/kll_sketch_float_get_min_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64<br><br>For more information:<br> \- https://datasketches.apache.org/docs/KLL/KLLSketch.html | -| [kll_sketch_float_to_string](kll/sqlx/kll_sketch_float_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/KLL/KLLSketch.html | -| [kll_sketch_float_get_num_retained](kll/sqlx/kll_sketch_float_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the number of retained items \(samples\) in the sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: number of retained items as INT64<br><br>For more information:<br> \- https://datasketches.apache.org/docs/KLL/KLLSketch.html | -| [kll_sketch_float_get_max_value](kll/sqlx/kll_sketch_float_get_max_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64<br><br>For more information:<br> \- https://datasketches.apache.org/docs/KLL/KLLSketch.html | -| [kll_sketch_float_get_normalized_rank_error](kll/sqlx/kll_sketch_float_get_normalized_rank_error.sqlx) | SCALAR | (sketch BYTES, pmf BOOL) -> FLOAT64 | Returns the approximate rank error of the given sketch normalized as a fraction between zero and one.<br>Param sketch: the given sketch as BYTES.<br>Param pmf: if true, returns the "double\-sided" normalized rank error for the get\_PMF\(\) function.<br>Otherwise, it is the "single\-sided" normalized rank error for all the other queries. [...] -| [kll_sketch_float_get_rank](kll/sqlx/kll_sketch_float_get_rank.sqlx) | SCALAR | (sketch BYTES, value FLOAT64, inclusive BOOL) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Param inclusive: if true the weight of the given value is included into the rank.<br>Returns: an approximate rank of the given value.<br><br>For more infor [...] -| [kll_sketch_float_get_pmf](kll/sqlx/kll_sketch_float_get_pmf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Probability Mass Function \(PMF\)<br>of the input stream as an array of probability masses defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values <br> \(of the same type as the input value [...] -| [kll_sketch_float_kolmogorov_smirnov](kll/sqlx/kll_sketch_float_kolmogorov_smirnov.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, pvalue FLOAT64) -> BOOL | Performs the Kolmogorov\-Smirnov Test between two KLL sketches of type FLOAT64.<br>If the given sketches have insufficient data or if the sketch sizes are too small, this will return false.<br><br>Param sketchA: sketch A in serialized form.<br>Param sketchB: sketch B in serialized form.<br>Param pvalue: Target p\-value. Typically 0 [...] -| [kll_sketch_float_get_cdf](kll/sqlx/kll_sketch_float_get_cdf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Cumulative Distribution Function \(CDF\) <br>of the input stream as an array of cumulative probabilities defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values<br> \(of the same type as th [...] -| [kll_sketch_float_get_quantile](kll/sqlx/kll_sketch_float_get_quantile.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64, inclusive BOOL) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Param inclusive: if true, the given rank is considered inclusive \(includes weight of a value\)<b [...] +| [kll_sketch_float_build](kll/sqlx/kll_sketch_float_build.sqlx) | AGGREGATE | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a KLL Sketch, as bytes. | +| [kll_sketch_float_merge](kll/sqlx/kll_sketch_float_merge.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaluts: k = 200.<br>Returns: a serialized KLL sketch as BYTES. | +| [kll_sketch_float_merge_k](kll/sqlx/kll_sketch_float_merge_k.sqlx) | AGGREGATE | (sketch BYTES, k INT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[8, 65535\].<br>Returns: a serialized KLL sketch as BYTES. | +| [kll_sketch_float_build_k](kll/sqlx/kll_sketch_float_build_k.sqlx) | AGGREGATE | (value FLOAT64, k INT NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[8, 65535\].<br>Returns: a KLL Sketch, as bytes. | +| [kll_sketch_float_get_n](kll/sqlx/kll_sketch_float_get_n.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the length of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: stream length as INT64 | +| [kll_sketch_float_get_min_value](kll/sqlx/kll_sketch_float_get_min_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 | +| [kll_sketch_float_to_string](kll/sqlx/kll_sketch_float_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | +| [kll_sketch_float_get_num_retained](kll/sqlx/kll_sketch_float_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the number of retained items \(samples\) in the sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: number of retained items as INT64 | +| [kll_sketch_float_get_max_value](kll/sqlx/kll_sketch_float_get_max_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 | +| [kll_sketch_float_get_normalized_rank_error](kll/sqlx/kll_sketch_float_get_normalized_rank_error.sqlx) | SCALAR | (sketch BYTES, pmf BOOL) -> FLOAT64 | Returns the approximate rank error of the given sketch normalized as a fraction between zero and one.<br>Param sketch: the given sketch as BYTES.<br>Param pmf: if true, returns the "double\-sided" normalized rank error for the get\_PMF\(\) function.<br>Otherwise, it is the "single\-sided" normalized rank error for all the other queries. [...] +| [kll_sketch_float_get_rank](kll/sqlx/kll_sketch_float_get_rank.sqlx) | SCALAR | (sketch BYTES, value FLOAT64, inclusive BOOL) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Param inclusive: if true the weight of the given value is included into the rank.<br>Returns: an approximate rank of the given value. | +| [kll_sketch_float_get_pmf](kll/sqlx/kll_sketch_float_get_pmf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Probability Mass Function \(PMF\)<br>of the input stream as an array of probability masses defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values <br> \(of the same type as the input value [...] +| [kll_sketch_float_kolmogorov_smirnov](kll/sqlx/kll_sketch_float_kolmogorov_smirnov.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, pvalue FLOAT64) -> BOOL | Performs the Kolmogorov\-Smirnov Test between two KLL sketches of type FLOAT64.<br>If the given sketches have insufficient data or if the sketch sizes are too small, this will return false.<br><br>Param sketchA: sketch A in serialized form.<br>Param sketchB: sketch B in serialized form.<br>Param pvalue: Target p\-value. Typically 0 [...] +| [kll_sketch_float_get_cdf](kll/sqlx/kll_sketch_float_get_cdf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Cumulative Distribution Function \(CDF\) <br>of the input stream as an array of cumulative probabilities defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values<br> \(of the same type as th [...] +| [kll_sketch_float_get_quantile](kll/sqlx/kll_sketch_float_get_quantile.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64, inclusive BOOL) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Param inclusive: if true, the given rank is considered inclusive \(includes weight of a value\)<b [...] **Examples:** @@ -378,39 +386,41 @@ select `$BQ_DATASET`.kll_sketch_float_kolmogorov_smirnov( ``` -## THETA Sketch Functions +## Theta Sketch Functions **Description:** Theta sketches are used for set operations like union, intersection, and difference. They are efficient for estimating the size of these operations on large datasets, enabling applications like analyzing user overlap or comparing different groups. +For more information: [Theta sketches](https://datasketches.apache.org/docs/Theta/ThetaSketches.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| -| [theta_sketch_agg_int64](theta/sqlx/theta_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br> <br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. <br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketches.html | -| [theta_sketch_agg_union](theta/sqlx/theta_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketches.html | -| [theta_sketch_agg_string](theta/sqlx/theta_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br> <br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. <br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketches.html | -| [theta_sketch_agg_union_lgk_seed](theta/sqlx/theta_sketch_agg_union_lgk_seed.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the [...] -| [theta_sketch_agg_int64_lgk_seed_p](theta/sqlx/theta_sketch_agg_int64_lgk_seed_p.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the und [...] -| [theta_sketch_agg_string_lgk_seed_p](theta/sqlx/theta_sketch_agg_string_lgk_seed_p.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the un [...] -| [theta_sketch_get_estimate](theta/sqlx/theta_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: a FLOAT64 value as the cardinality estimate.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketches.html | -| [theta_sketch_to_string](theta/sqlx/theta_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketches.html | -| [theta_sketch_get_num_retained](theta/sqlx/theta_sketch_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT | Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: number of retained entries as INT.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketches.html | -| [theta_sketch_get_theta](theta/sqlx/theta_sketch_get_theta.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketches.html | -| [theta_sketch_get_num_retained_seed](theta/sqlx/theta_sketch_get_num_retained_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> INT | Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: number of retained entries as INT.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketc [...] -| [theta_sketch_get_estimate_seed](theta/sqlx/theta_sketch_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a FLOAT64 value as the cardinality estimate.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSke [...] -| [theta_sketch_to_string_seed](theta/sqlx/theta_sketch_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta [...] -| [theta_sketch_get_theta_seed](theta/sqlx/theta_sketch_get_theta_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: theta as FLOAT64.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketches.html | -| [theta_sketch_intersection](theta/sqlx/theta_sketch_intersection.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketches.html | -| [theta_sketch_union](theta/sqlx/theta_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketches.html | -| [theta_sketch_a_not_b](theta/sqlx/theta_sketch_a_not_b.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES.<br><br>For more information:<br> \- https://datasketches.apache.org/docs/Theta/ThetaSketches.html | -| [theta_sketch_intersection_seed](theta/sqlx/theta_sketch_intersection_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES.<br>< [...] -| [theta_sketch_a_not_b_seed](theta/sqlx/theta_sketch_a_not_b_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES.<br><b [...] -| [theta_sketch_union_lgk_seed](theta/sqlx/theta_sketch_union_lgk_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were config [...] -| [theta_sketch_get_estimate_and_bounds](theta/sqlx/theta_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of standard deviations from th [...] -| [theta_sketch_jaccard_similarity](theta/sqlx/theta_sketch_jaccard_similarity.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are disjoint.<br>A [...] -| [theta_sketch_get_estimate_and_bounds_seed](theta/sqlx/theta_sketch_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of stand [...] -| [theta_sketch_jaccard_similarity_seed](theta/sqlx/theta_sketch_jaccard_similarity_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketch [...] +| [theta_sketch_agg_int64](theta/sqlx/theta_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br> <br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_agg_union](theta/sqlx/theta_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_agg_string](theta/sqlx/theta_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br> <br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_agg_union_lgk_seed](theta/sqlx/theta_sketch_agg_union_lgk_seed.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the [...] +| [theta_sketch_agg_int64_lgk_seed_p](theta/sqlx/theta_sketch_agg_int64_lgk_seed_p.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the und [...] +| [theta_sketch_agg_string_lgk_seed_p](theta/sqlx/theta_sketch_agg_string_lgk_seed_p.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the un [...] +| [theta_sketch_get_estimate](theta/sqlx/theta_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: a FLOAT64 value as the cardinality estimate. | +| [theta_sketch_to_string](theta/sqlx/theta_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch. | +| [theta_sketch_get_num_retained](theta/sqlx/theta_sketch_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT | Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: number of retained entries as INT. | +| [theta_sketch_get_theta](theta/sqlx/theta_sketch_get_theta.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64. | +| [theta_sketch_get_num_retained_seed](theta/sqlx/theta_sketch_get_num_retained_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> INT | Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: number of retained entries as INT. | +| [theta_sketch_get_estimate_seed](theta/sqlx/theta_sketch_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a FLOAT64 value as the cardinality estimate. | +| [theta_sketch_to_string_seed](theta/sqlx/theta_sketch_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch. | +| [theta_sketch_get_theta_seed](theta/sqlx/theta_sketch_get_theta_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: theta as FLOAT64. | +| [theta_sketch_intersection](theta/sqlx/theta_sketch_intersection.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_union](theta/sqlx/theta_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_a_not_b](theta/sqlx/theta_sketch_a_not_b.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_intersection_seed](theta/sqlx/theta_sketch_intersection_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_a_not_b_seed](theta/sqlx/theta_sketch_a_not_b_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_union_lgk_seed](theta/sqlx/theta_sketch_union_lgk_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were config [...] +| [theta_sketch_get_estimate_and_bounds](theta/sqlx/theta_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of standard deviations from th [...] +| [theta_sketch_jaccard_similarity](theta/sqlx/theta_sketch_jaccard_similarity.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are disjoint.<br>A [...] +| [theta_sketch_get_estimate_and_bounds_seed](theta/sqlx/theta_sketch_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of stand [...] +| [theta_sketch_jaccard_similarity_seed](theta/sqlx/theta_sketch_jaccard_similarity_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketch [...] **Examples:** @@ -556,45 +566,47 @@ select `$BQ_DATASET`.theta_sketch_jaccard_similarity_seed( ``` -## TUPLE Sketch Functions +## Tuple Sketch Functions **Description:** Tuple sketches extend the functionality of Theta sketches by allowing you to associate a summary value with each item in the set. This enables calculations like the sum, minimum, or maximum of values associated with the distinct items. +For more information: [Tuple sketches](https://datasketches.apache.org/docs/Tuple/TupleSketches.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| -| [tuple_sketch_int64_agg_union](tuple/sqlx/tuple_sketch_int64_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the given column of Tuple Sketches with an INT64 summary column. This may not be NU [...] -| [tuple_sketch_int64_agg_string](tuple/sqlx/tuple_sketch_int64_agg_string.sqlx) | AGGREGATE | (key STRING, value INT64) -> BYTES | Builds a Tuple Sketch from an STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an STRING Key column and an I [...] -| [tuple_sketch_int64_agg_int64](tuple/sqlx/tuple_sketch_int64_agg_int64.sqlx) | AGGREGATE | (key INT64, value INT64) -> BYTES | Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 Key column and an INT64 [...] -| [tuple_sketch_int64_agg_union_lgk_seed_mode](tuple/sqlx/tuple_sketch_int64_agg_union_lgk_seed_mode.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>P [...] -| [tuple_sketch_int64_agg_int64_lgk_seed_p_mode](tuple/sqlx/tuple_sketch_int64_agg_int64_lgk_seed_p_mode.sqlx) | AGGREGATE | (key INT64, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: { SUM, MIN, MAX, ONE \(constant 1\) }.<br>Note that cardinality estimation accuracy, pl [...] -| [tuple_sketch_int64_agg_string_lgk_seed_p_mode](tuple/sqlx/tuple_sketch_int64_agg_string_lgk_seed_p_mode.sqlx) | AGGREGATE | (key STRING, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch from an STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: SUM, MIN, MAX, ONE.<br>Note that cardinality estimation accuracy, plots, error tabl [...] -| [tuple_sketch_int64_to_string](tuple/sqlx/tuple_sketch_int64_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Defaults: seed = 9001.<br>Ret [...] -| [tuple_sketch_int64_get_estimate](tuple/sqlx/tuple_sketch_int64_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: the cardinality est [...] -| [tuple_sketch_int64_get_theta](tuple/sqlx/tuple_sketch_int64_get_theta.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64 [...] -| [tuple_sketch_int64_get_num_retained](tuple/sqlx/tuple_sketch_int64_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT | Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: number of retai [...] -| [tuple_sketch_int64_get_theta_seed](tuple/sqlx/tuple_sketch_int64_get_theta_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to c [...] -| [tuple_sketch_int64_get_num_retained_seed](tuple/sqlx/tuple_sketch_int64_get_num_retained_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> INT | Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to [...] -| [tuple_sketch_int64_to_string_seed](tuple/sqlx/tuple_sketch_int64_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Param s [...] -| [tuple_sketch_int64_a_not_b](tuple/sqlx/tuple_sketch_int64_a_not_b.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column. <br> <br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the s [...] -| [tuple_sketch_int64_from_theta_sketch](tuple/sqlx/tuple_sketch_int64_from_theta_sketch.sqlx) | SCALAR | (sketch BYTES, value INT64) -> BYTES | Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. This may not be NULL.<br>De [...] -| [tuple_sketch_int64_get_estimate_seed](tuple/sqlx/tuple_sketch_int64_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to conf [...] -| [tuple_sketch_int64_intersection](tuple/sqlx/tuple_sketch_int64_intersection.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES.<br>Param sketchB: the second sketch " [...] -| [tuple_sketch_int64_union](tuple/sqlx/tuple_sketch_int64_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the second sketch " [...] -| [tuple_sketch_int64_from_theta_sketch_seed](tuple/sqlx/tuple_sketch_int64_from_theta_sketch_seed.sqlx) | SCALAR | (sketch BYTES, value INT64, seed INT64) -> BYTES | Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. This [...] -| [tuple_sketch_int64_a_not_b_seed](tuple/sqlx/tuple_sketch_int64_a_not_b_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be NUL [...] -| [tuple_sketch_int64_filter_low_high](tuple/sqlx/tuple_sketch_int64_filter_low_high.sqlx) | SCALAR | (sketch BYTES, low INT64, high INT64) -> BYTES | Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that cardinality estimation ac [...] -| [tuple_sketch_int64_get_estimate_and_bounds](tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>P [...] -| [tuple_sketch_int64_filter_low_high_seed](tuple/sqlx/tuple_sketch_int64_filter_low_high_seed.sqlx) | SCALAR | (sketch BYTES, low INT64, high INT64, seed INT64) -> BYTES | Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that car [...] -| [tuple_sketch_int64_jaccard_similarity](tuple/sqlx/tuple_sketch_int64_jaccard_similarity.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are dis [...] -| [tuple_sketch_int64_get_sum_estimate_and_bounds](tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<sum_estimate FLOAT64, sum_lower_bound FLOAT64, sum_upper_bound FLOAT64> | Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> [...] -| [tuple_sketch_int64_intersection_seed_mode](tuple/sqlx/tuple_sketch_int64_intersection_seed_mode.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64, mode STRING) -> BYTES | Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as [...] -| [tuple_sketch_int64_get_sum_estimate_and_bounds_seed](tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<sum_estimate FLOAT64, sum_lower_bound FLOAT64, sum_upper_bound FLOAT64> | Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as [...] -| [tuple_sketch_int64_union_lgk_seed_mode](tuple/sqlx/tuple_sketch_int64_union_lgk_seed_mode.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64, mode STRING) -> BYTES | Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as [...] -| [tuple_sketch_int64_get_estimate_and_bounds_seed](tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summ [...] -| [tuple_sketch_int64_jaccard_similarity_seed](tuple/sqlx/tuple_sketch_int64_jaccard_similarity_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, th [...] +| [tuple_sketch_int64_agg_union](tuple/sqlx/tuple_sketch_int64_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the given column of Tuple Sketches with an INT64 summary column. This may not be NU [...] +| [tuple_sketch_int64_agg_string](tuple/sqlx/tuple_sketch_int64_agg_string.sqlx) | AGGREGATE | (key STRING, value INT64) -> BYTES | Builds a Tuple Sketch from an STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an STRING Key column and an I [...] +| [tuple_sketch_int64_agg_int64](tuple/sqlx/tuple_sketch_int64_agg_int64.sqlx) | AGGREGATE | (key INT64, value INT64) -> BYTES | Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 Key column and an INT64 [...] +| [tuple_sketch_int64_agg_union_lgk_seed_mode](tuple/sqlx/tuple_sketch_int64_agg_union_lgk_seed_mode.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>P [...] +| [tuple_sketch_int64_agg_int64_lgk_seed_p_mode](tuple/sqlx/tuple_sketch_int64_agg_int64_lgk_seed_p_mode.sqlx) | AGGREGATE | (key INT64, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: { SUM, MIN, MAX, ONE \(constant 1\) }.<br>Note that cardinality estimation accuracy, pl [...] +| [tuple_sketch_int64_agg_string_lgk_seed_p_mode](tuple/sqlx/tuple_sketch_int64_agg_string_lgk_seed_p_mode.sqlx) | AGGREGATE | (key STRING, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch from an STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: SUM, MIN, MAX, ONE.<br>Note that cardinality estimation accuracy, plots, error tabl [...] +| [tuple_sketch_int64_to_string](tuple/sqlx/tuple_sketch_int64_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Defaults: seed = 9001.<br>Ret [...] +| [tuple_sketch_int64_get_estimate](tuple/sqlx/tuple_sketch_int64_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: the cardinality est [...] +| [tuple_sketch_int64_get_theta](tuple/sqlx/tuple_sketch_int64_get_theta.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64. | +| [tuple_sketch_int64_get_num_retained](tuple/sqlx/tuple_sketch_int64_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT | Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: number of retai [...] +| [tuple_sketch_int64_get_theta_seed](tuple/sqlx/tuple_sketch_int64_get_theta_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to c [...] +| [tuple_sketch_int64_get_num_retained_seed](tuple/sqlx/tuple_sketch_int64_get_num_retained_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> INT | Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to [...] +| [tuple_sketch_int64_to_string_seed](tuple/sqlx/tuple_sketch_int64_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Param s [...] +| [tuple_sketch_int64_a_not_b](tuple/sqlx/tuple_sketch_int64_a_not_b.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column. <br> <br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the s [...] +| [tuple_sketch_int64_from_theta_sketch](tuple/sqlx/tuple_sketch_int64_from_theta_sketch.sqlx) | SCALAR | (sketch BYTES, value INT64) -> BYTES | Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. This may not be NULL.<br>De [...] +| [tuple_sketch_int64_get_estimate_seed](tuple/sqlx/tuple_sketch_int64_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to conf [...] +| [tuple_sketch_int64_intersection](tuple/sqlx/tuple_sketch_int64_intersection.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES.<br>Param sketchB: the second sketch " [...] +| [tuple_sketch_int64_union](tuple/sqlx/tuple_sketch_int64_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the second sketch " [...] +| [tuple_sketch_int64_from_theta_sketch_seed](tuple/sqlx/tuple_sketch_int64_from_theta_sketch_seed.sqlx) | SCALAR | (sketch BYTES, value INT64, seed INT64) -> BYTES | Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. This [...] +| [tuple_sketch_int64_a_not_b_seed](tuple/sqlx/tuple_sketch_int64_a_not_b_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be NUL [...] +| [tuple_sketch_int64_filter_low_high](tuple/sqlx/tuple_sketch_int64_filter_low_high.sqlx) | SCALAR | (sketch BYTES, low INT64, high INT64) -> BYTES | Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that cardinality estimation ac [...] +| [tuple_sketch_int64_get_estimate_and_bounds](tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>P [...] +| [tuple_sketch_int64_filter_low_high_seed](tuple/sqlx/tuple_sketch_int64_filter_low_high_seed.sqlx) | SCALAR | (sketch BYTES, low INT64, high INT64, seed INT64) -> BYTES | Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that car [...] +| [tuple_sketch_int64_jaccard_similarity](tuple/sqlx/tuple_sketch_int64_jaccard_similarity.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are dis [...] +| [tuple_sketch_int64_get_sum_estimate_and_bounds](tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<sum_estimate FLOAT64, sum_lower_bound FLOAT64, sum_upper_bound FLOAT64> | Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> [...] +| [tuple_sketch_int64_intersection_seed_mode](tuple/sqlx/tuple_sketch_int64_intersection_seed_mode.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64, mode STRING) -> BYTES | Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as [...] +| [tuple_sketch_int64_get_sum_estimate_and_bounds_seed](tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<sum_estimate FLOAT64, sum_lower_bound FLOAT64, sum_upper_bound FLOAT64> | Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as [...] +| [tuple_sketch_int64_union_lgk_seed_mode](tuple/sqlx/tuple_sketch_int64_union_lgk_seed_mode.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64, mode STRING) -> BYTES | Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as [...] +| [tuple_sketch_int64_get_estimate_and_bounds_seed](tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summ [...] +| [tuple_sketch_int64_jaccard_similarity_seed](tuple/sqlx/tuple_sketch_int64_jaccard_similarity_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, th [...] **Examples:** @@ -798,18 +810,20 @@ select `$BQ_DATASET`.tuple_sketch_int64_to_string_seed( **Description:** Similar to KLL sketch, estimates distributions of numeric values, provides approximate quantiles and ranks. +For more information: [t-digest](https://github.com/tdunning/t-digest) + | Function Name | Function Type | Signature | Description | |---|---|---|---| -| [tdigest_double_build](tdigest/sqlx/tdigest_double_build.sqlx) | AGGREGATE | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a t\-Digest, as bytes.<br><br>For more information:<br> \- https://github.com/tdunning/t\-digest | -| [tdigest_double_merge](tdigest/sqlx/tdigest_double_merge.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaults: k = 200.<br>Returns: a serialized t\-Digest as BYTES.<br><br>For more information:<br> \- https://github.com/tdunning/t\-digest | -| [tdigest_double_merge_k](tdigest/sqlx/tdigest_double_merge_k.sqlx) | AGGREGATE | (sketch BYTES, k INT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[10, 65535\].<br>Returns: a serialized t\-Digest as BYTES.<br><br>For more information:<br> \- https://github.com/tdunning/t\-digest | -| [tdigest_double_build_k](tdigest/sqlx/tdigest_double_build_k.sqlx) | AGGREGATE | (value FLOAT64, k INT NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[10, 65535\].<br>Returns: a t\-Digest, as bytes.<br><br>For more information:<br> \- https://github.com/tdunning/t\-digest | -| [tdigest_double_get_max_value](tdigest/sqlx/tdigest_double_get_max_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64<br><br>For more information:<br> \- https://github.com/tdunning/t\-digest | -| [tdigest_double_to_string](tdigest/sqlx/tdigest_double_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch.<br><br>For more information:<br> \- https://github.com/tdunning/t\-digest | -| [tdigest_double_get_total_weight](tdigest/sqlx/tdigest_double_get_total_weight.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the total weight of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: total weight as INT64<br><br>For more information:<br> \- https://github.com/tdunning/t\-digest | -| [tdigest_double_get_min_value](tdigest/sqlx/tdigest_double_get_min_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64<br><br>For more information:<br> \- https://github.com/tdunning/t\-digest | -| [tdigest_double_get_rank](tdigest/sqlx/tdigest_double_get_rank.sqlx) | SCALAR | (sketch BYTES, value FLOAT64) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Returns: an approximate rank of the given value.<br><br>For more information:<br> \- https://github.com/tdunning/t\-digest | -| [tdigest_double_get_quantile](tdigest/sqlx/tdigest_double_get_quantile.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Returns: an approximate quantile associated with the given rank.<br><br>For more information:<br> \- https://gith [...] +| [tdigest_double_build](tdigest/sqlx/tdigest_double_build.sqlx) | AGGREGATE | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a t\-Digest, as bytes. | +| [tdigest_double_merge](tdigest/sqlx/tdigest_double_merge.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaults: k = 200.<br>Returns: a serialized t\-Digest as BYTES. | +| [tdigest_double_merge_k](tdigest/sqlx/tdigest_double_merge_k.sqlx) | AGGREGATE | (sketch BYTES, k INT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[10, 65535\].<br>Returns: a serialized t\-Digest as BYTES. | +| [tdigest_double_build_k](tdigest/sqlx/tdigest_double_build_k.sqlx) | AGGREGATE | (value FLOAT64, k INT NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[10, 65535\].<br>Returns: a t\-Digest, as bytes. | +| [tdigest_double_get_max_value](tdigest/sqlx/tdigest_double_get_max_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 | +| [tdigest_double_to_string](tdigest/sqlx/tdigest_double_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | +| [tdigest_double_get_total_weight](tdigest/sqlx/tdigest_double_get_total_weight.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the total weight of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: total weight as INT64 | +| [tdigest_double_get_min_value](tdigest/sqlx/tdigest_double_get_min_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 | +| [tdigest_double_get_rank](tdigest/sqlx/tdigest_double_get_rank.sqlx) | SCALAR | (sketch BYTES, value FLOAT64) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Returns: an approximate rank of the given value. | +| [tdigest_double_get_quantile](tdigest/sqlx/tdigest_double_get_quantile.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Returns: an approximate quantile associated with the given rank. | **Examples:** diff --git a/README_template.md b/README_template.md index 62e2760..eaf75d6 100644 --- a/README_template.md +++ b/README_template.md @@ -81,6 +81,8 @@ accurate estimates with low memory usage and are particularly useful for applications like counting unique users, analyzing website traffic, or tracking distinct events. +For more information: [CPC Sketches](https://datasketches.apache.org/docs/CPC/CpcSketches.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| @@ -91,6 +93,8 @@ frequencies of items in a dataset. They are effective for identifying the most frequent items, such as the top products purchased or the most popular search queries. +For more information: [Frequency Sketches](https://datasketches.apache.org/docs/Frequency/FrequencySketches.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| @@ -100,6 +104,8 @@ queries. estimation sketch. They are known for their high accuracy and low memory consumption, making them suitable for large datasets and real-time analytics. +For more information: [HLL Sketches](https://datasketches.apache.org/docs/HLL/HllSketches.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| @@ -109,26 +115,32 @@ consumption, making them suitable for large datasets and real-time analytics. quantiles for a dataset. They are useful for understanding the distribution of data and calculating percentiles, such as the median or 95th percentile. +For more information: [KLL Sketches](https://datasketches.apache.org/docs/KLL/KLLSketch.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| -## THETA Sketch Functions +## Theta Sketch Functions **Description:** Theta sketches are used for set operations like union, intersection, and difference. They are efficient for estimating the size of these operations on large datasets, enabling applications like analyzing user overlap or comparing different groups. +For more information: [Theta sketches](https://datasketches.apache.org/docs/Theta/ThetaSketches.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| -## TUPLE Sketch Functions +## Tuple Sketch Functions **Description:** Tuple sketches extend the functionality of Theta sketches by allowing you to associate a summary value with each item in the set. This enables calculations like the sum, minimum, or maximum of values associated with the distinct items. +For more information: [Tuple sketches](https://datasketches.apache.org/docs/Tuple/TupleSketches.html) + | Function Name | Function Type | Signature | Description | |---|---|---|---| @@ -137,5 +149,7 @@ the distinct items. **Description:** Similar to KLL sketch, estimates distributions of numeric values, provides approximate quantiles and ranks. +For more information: [t-digest](https://github.com/tdunning/t-digest) + | Function Name | Function Type | Signature | Description | |---|---|---|---| diff --git a/readme_generator.py b/readme_generator.py index 8ad9631..795d26e 100644 --- a/readme_generator.py +++ b/readme_generator.py @@ -75,6 +75,7 @@ def parse_sqlx(file_content: str, filename: str) -> dict: # Format multiline descriptions for Markdown and escape them description = escape_markdown(description) + description = re.compile(r'\n*For more info.*', re.M | re.S).sub('', description) # remove repetitive links description = description.replace('\n', '<br>') # Replace newlines with <br> tags # Extract function arguments and their types --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
