This is an automated email from the ASF dual-hosted git repository. mhubail pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/asterixdb.git
commit ed4bd17f06d2d0aa2ebce21bcd70995d65299226 Author: Simon Dew <[email protected]> AuthorDate: Fri Apr 26 15:21:06 2024 +0100 [ASTERIXDB-3388][DOC]: Add ARRAY_MEDIAN() Function ... and associated MEDIAN() aggregation pseudo-function. - Add ARRAY_MEDIAN to the builtin functions - Add MEDIAN to aggregation pseudo-functions - Fix typo: SKEWENESS → SKEWNESS - Fix order of aggregation pseudo-function table - Update details of OVER clause with aggregate functions - Update return values for all aggregate functions Change-Id: Icb6459ea616244a048ed95757b3a0eb9094ba85a Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18251 Reviewed-by: Ali Alsuliman <[email protected]> Tested-by: Murtadha Hubail <[email protected]> --- .../src/main/markdown/builtins/9_aggregate_sql.md | 91 ++++++++++++++++------ .../asterix-doc/src/main/markdown/sqlpp/3_query.md | 13 ++-- .../src/main/markdown/sqlpp/4_windowfunctions.md | 4 +- 3 files changed, 77 insertions(+), 31 deletions(-) diff --git a/asterixdb/asterix-doc/src/main/markdown/builtins/9_aggregate_sql.md b/asterixdb/asterix-doc/src/main/markdown/builtins/9_aggregate_sql.md index 0657fb07c2..755fd39c77 100644 --- a/asterixdb/asterix-doc/src/main/markdown/builtins/9_aggregate_sql.md +++ b/asterixdb/asterix-doc/src/main/markdown/builtins/9_aggregate_sql.md @@ -48,8 +48,8 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * or, a `missing` value. * Return Value: * a `bigint` value representing the number of non-null and non-missing items in the given collection, - * `null` is returned if the input is `null` or `missing`, - * any other non-array and non-multiset input value will cause an error. + * `0` is returned if the input is `null` or `missing`, + * `0` is returned if the input is not an array or a multiset. * Example: @@ -77,8 +77,8 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the average of the non-null and non-missing numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if the given collection does not contain any non-null and non-missing items, - * any other non-array and non-multiset input value will cause a type error, - * any other non-numeric value in the input collection will cause a type error. + * `null` is returned if the input is not an array or a multiset, + * any other non-numeric value in the input collection will be ignored. * Example: @@ -107,8 +107,8 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. items. * `null` is returned if the input is `null` or `missing`, * `null` is returned if the given collection does not contain any non-null and non-missing items, - * any other non-array and non-multiset input value will cause a type error, - * any other non-numeric value in the input collection will cause a type error. + * `null` is returned if the input is not an array or a multiset, + * any other non-numeric value in the input collection will be ignored. * Example: @@ -136,8 +136,8 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. type promotion order (`tinyint`-> `smallint`->`integer`->`bigint`->`float`->`double`) among numeric items. * `null` is returned if the input is `null` or `missing`, * `null` is returned if the given collection does not contain any non-null and non-missing items, - * multiple incomparable items in the input array or multiset will cause a type error, - * any other non-array and non-multiset input value will cause a type error. + * `null` is returned if there are incomparable items in the input array or multiset, + * `null` is returned if the input is not an array or a multiset. * Example: @@ -165,8 +165,8 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. type promotion order (`tinyint`-> `smallint`->`integer`->`bigint`->`float`->`double`) among numeric items. * `null` is returned if the input is `null` or `missing`, * `null` is returned if the given collection does not contain any non-null and non-missing items, - * multiple incomparable items in the input array or multiset will cause a type error, - * any other non-array and non-multiset input value will cause a type error. + * `null` is returned if there are incomparable items in the input array or multiset, + * `null` is returned if the input is not an array or a multiset. * Example: @@ -177,6 +177,44 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. 3.4 +### array_median ### + * Syntax: + + array_median(num_collection) + + * Gets the median value of the numeric items in the given collection, ignoring null, missing, and non-numeric items. + + The function starts by sorting the numeric items. + + - If there is an odd number of numeric items, the function returns the item that is exactly in the middle of the range: that is, it has the same number of items before and after. + - If there is an even number of numeric items, the function returns the mean of the two items that are exactly in the middle of the range. + + * Note: You cannot use the `DISTINCT` keyword with this function, or with the `median` aggregation pseudo-function. + The `median` aggregation pseudo-function does support the `FILTER` clause. + There is no `strict_median` function corresponding to this function. + * Arguments: + * `num_collection` could be: + * an `array` or `multiset` of numbers, + * or, a `null` value, + * or, a `missing` value. + * Clauses: When used as a window function, this function supports the [Window Partition Clause](manual.html#Window_partition_clause), but not the [Window Order Clause](manual.html#Window_order_clause) or the [Window Frame Clause](manual.html#Window_frame_clause). + * Return Value: + * a `double` value representing the median of the numeric items in the given collection, + * `null` is returned if the input is `null` or `missing`, + * `null` is returned if the given collection does not contain any numeric items, + * `null` is returned if the input is not an array or a multiset, + * any other non-numeric value in the input collection will be ignored. + * Example: + + { "v1": array_median( [1.2, 2.3, 3.4, 0, null, missing], + "v2": array_median( [1.2, 2.3, 3.4, 4.5, 0, null, missing] ) }; + + * The expected result is: + + { "v1": 1.75, + "v2": 2.3 } + + ### array_stddev_samp ### * Syntax: @@ -193,7 +231,7 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the sample standard deviation of the non-null and non-missing numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if the given collection does not contain any non-null and non-missing items, - * any other non-array and non-multiset input value will cause a type error, + * `null` is returned if the input is not an array or a multiset, * any other non-numeric value in the input collection will cause a type error. * Example: @@ -220,7 +258,7 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the population standard deviation of the non-null and non-missing numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if the given collection does not contain any non-null and non-missing items, - * any other non-array and non-multiset input value will cause a type error, + * `null` is returned if the input is not an array or a multiset, * any other non-numeric value in the input collection will cause a type error. * Example: @@ -247,7 +285,7 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the sample variance of the non-null and non-missing numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if the given collection does not contain any non-null and non-missing items, - * any other non-array and non-multiset input value will cause a type error, + * `null` is returned if the input is not an array or a multiset, * any other non-numeric value in the input collection will cause a type error. * Example: @@ -274,7 +312,7 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the population variance of the non-null and non-missing numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if the given collection does not contain any non-null and non-missing items, - * any other non-array and non-multiset input value will cause a type error, + * `null` is returned if the input is not an array or a multiset, * any other non-numeric value in the input collection will cause a type error. * Example: @@ -301,7 +339,7 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the skewness of the non-null and non-missing numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if the given collection does not contain any non-null and non-missing items, - * any other non-array and non-multiset input value will cause a type error, + * `null` is returned if the input is not an array or a multiset, * any other non-numeric value in the input collection will cause a type error. * Example: @@ -328,7 +366,7 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the kurtosis from a normal distribution of the non-null and non-missing numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if the given collection does not contain any non-null and non-missing items, - * any other non-array and non-multiset input value will cause a type error, + * `null` is returned if the input is not an array or a multiset, * any other non-numeric value in the input collection will cause a type error. * Example: @@ -352,7 +390,8 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * or a `missing` value. * Return Value: * a `bigint` value representing the number of items in the given collection, - * `null` is returned if the input is `null` or `missing`. + * `0` is returned if the input is `null` or `missing`, + * `0` is returned if the input is not an array or a multiset. * Example: @@ -377,7 +416,8 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the average of the numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if there is a `null` or `missing` in the input collection, - * any other non-numeric value in the input collection will cause a type error. + * `null` is returned if the input is not an array or a multiset, + * `null` is returned if there are any other non-numeric values in the input collection. * Example: @@ -404,7 +444,8 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. items. * `null` is returned if the input is `null` or `missing`, * `null` is returned if there is a `null` or `missing` in the input collection, - * any other non-numeric value in the input collection will cause a type error. + * `null` is returned if the input is not an array or a multiset, + * `null` is returned if there are any other non-numeric values in the input collection. * Example: @@ -431,8 +472,8 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. (`tinyint`-> `smallint`->`integer`->`bigint`->`float`->`double`) among numeric items. * `null` is returned if the input is `null` or `missing`, * `null` is returned if there is a `null` or `missing` in the input collection, - * multiple incomparable items in the input array or multiset will cause a type error, - * any other non-array and non-multiset input value will cause a type error. + * `null` is returned if there are incomparable items in the input array or multiset, + * `null` is returned if the input is not an array or a multiset. * Example: @@ -460,8 +501,8 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. (`tinyint`-> `smallint`->`integer`->`bigint`->`float`->`double`) among numeric items. * `null` is returned if the input is `null` or `missing`, * `null` is returned if there is a `null` or `missing` in the input collection, - * multiple incomparable items in the input array or multiset will cause a type error, - * any other non-array and non-multiset input value will cause a type error. + * `null` is returned if there are incomparable items in the input array or multiset, + * `null` is returned if the input is not an array or a multiset. * Example: @@ -536,6 +577,7 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the sample variance of the numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if there is a `null` or `missing` in the input collection, + * `null` is returned if the input is not an array or a multiset, * any other non-numeric value in the input collection will cause a type error. * Example: @@ -561,6 +603,7 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the population variance of the numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if there is a `null` or `missing` in the input collection, + * `null` is returned if the input is not an array or a multiset, * any other non-numeric value in the input collection will cause a type error. * Example: @@ -586,6 +629,7 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the skewness of the numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if there is a `null` or `missing` in the input collection, + * `null` is returned if the input is not an array or a multiset, * any other non-numeric value in the input collection will cause a type error. * Example: @@ -611,6 +655,7 @@ Refer to [OVER Clauses](manual.html#Over_clauses) for details. * a `double` value representing the kurtosis from a normal distribution of the numbers in the given collection, * `null` is returned if the input is `null` or `missing`, * `null` is returned if there is a `null` or `missing` in the input collection, + * `null` is returned if the input is not an array or a multiset, * any other non-numeric value in the input collection will cause a type error. * Example: diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md index 04a65d1784..2bd9f49398 100644 --- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md +++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md @@ -1215,20 +1215,21 @@ The aggregation pseudo-function `COUNT` has a special form in which its operand For example, `SELECT COUNT(*) FROM customers` simply returns the total number of customers, whereas `SELECT COUNT(rating) FROM customers` returns the number of customers who have known ratings (that is, their ratings are not `null` or `missing`). -Because the aggregation pseudo-functions sometimes restructure their operands, they can be used only in query blocks where (explicit or implicit) grouping is being done. Therefore the pseudo-functions cannot operate directly on arrays or multisets. For operating directly on JSON collections, SQL++ provides a set of ordinary functions for computing aggregations. Each ordinary aggregation function (except the ones corresponding to `COUNT` and `ARRAY_AGG`) has two versions: one that ignores [...] +Because the aggregation pseudo-functions sometimes restructure their operands, they can be used only in query blocks where (explicit or implicit) grouping is being done. Therefore the pseudo-functions cannot operate directly on arrays or multisets. For operating directly on JSON collections, SQL++ provides a set of ordinary functions for computing aggregations. Each ordinary aggregation function (except as noted below) has two versions: one that ignores `null` and `missing` values, and o [...] | Aggregation pseudo-function; operates on groups only | Ordinary function: Ignores NULL or MISSING values | Ordinary function: Returns NULL if NULL or MISSING are encountered| |----------|----------|--------| -|SUM| ARRAY_SUM| STRICT_SUM | -| AVG |ARRAY_MAX| STRICT_MAX | -| MAX | ARRAY_MIN| STRICT_MIN | -| MIN | ARRAY_AVG| STRICT_AVG | +| SUM | ARRAY_SUM| STRICT_SUM | +| AVG | ARRAY_AVG| STRICT_AVG | +| MAX | ARRAY_MAX| STRICT_MAX | +| MIN | ARRAY_MIN| STRICT_MIN | | COUNT |ARRAY_COUNT|STRICT_COUNT (see exception below) | +| MEDIAN | ARRAY_MEDIAN | | |STDDEV_SAMP|ARRAY_STDDEV_SAMP| STRICT_STDDEV_SAMP | |STDDEV_POP|ARRAY_STDDEV_POP| STRICT_STDDEV_POP | |VAR_SAMP|ARRAY_VAR_SAMP| STRICT_VAR_SAMP | |VAR_POP|ARRAY_VAR_POP| STRICT_VAR_POP | -|SKEWENESS|ARRAY_SKEWNESS| STRICT_SKEWNESS | +|SKEWNESS|ARRAY_SKEWNESS| STRICT_SKEWNESS | |KURTOSIS|ARRAY_KURTOSIS| STRICT_KURTOSIS | | |ARRAY_AGG| | diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_windowfunctions.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_windowfunctions.md index d6e40e76c4..c41e8a3a5b 100644 --- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_windowfunctions.md +++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_windowfunctions.md @@ -108,7 +108,7 @@ If omitted, all tuples are united in a single partition. The *window order clause* determines how tuples are ordered within each partition. The window function works on tuples in the order specified by this clause. -This clause may be used with any [window function](builtins.html#WindowFunctions), or any [aggregate function](builtins.html#AggregateFunctions) used as a window function. +This clause may be used with any [window function](builtins.html#WindowFunctions), and most [aggregate functions](builtins.html#AggregateFunctions) — refer to the descriptions of individual functions for more details. This clause is optional. If omitted, all tuples are considered peers, i.e. their order is tied. When tuples in the window partition are tied, each window function behaves differently. @@ -130,7 +130,7 @@ This clause is optional. If omitted, all tuples are considered peers, i.e. their ##### WindowFrameClause  -The *window frame clause* defines the window frame. It can be used with all [aggregate functions](builtins.html#AggregateFunctions) and some [window functions](builtins.html#WindowFunctions) — refer to the descriptions of individual functions for more details. It is optional and allowed only when the [window order clause](#Window_order_clause) is present. +The *window frame clause* defines the window frame. It can be used with some [window functions](builtins.html#WindowFunctions) and most [aggregate functions](builtins.html#AggregateFunctions) — refer to the descriptions of individual functions for more details. It is optional and allowed only when the [window order clause](#Window_order_clause) is present. * If this clause is omitted and there is no [window order clause](#Window_order_clause), the window frame is the entire partition.
