soumyava commented on code in PR #15902: URL: https://github.com/apache/druid/pull/15902#discussion_r1491330021
########## docs/querying/sql-functions.md: ########## @@ -684,6 +700,14 @@ Calculates _e_ raised to the power of the numeric expression. Extracts the value of some unit of the timestamp, optionally from a certain time zone, and returns the number. +## FIRST_VALUE + +`FIRST_VALUE(expr)` + +**Function type:** [Window](sql-window-functions.md#window-function-reference) + +Returns the value for the expression for the first row within the window. Review Comment: nit. Maybe value evaluated for the expression ########## docs/querying/sql-functions.md: ########## @@ -876,6 +916,14 @@ Returns the value of a numeric or string expression corresponding to the latest Returns the value of a numeric or string expression corresponding to the latest time value from `timestampExpr`. +## LEAD + +`LEAD(expr[, offset])` + +**Function type:** [Window](sql-window-functions.md#window-function-reference) + +Returns the value evaluated at the row that follows the current row by the offset number within the window; if there is no such row, returns the given default value. `offset` defaults to 1 if not provided. Review Comment: Maybe similar wording as lag ########## docs/querying/sql-functions.md: ########## @@ -480,6 +480,14 @@ Calculates the trigonometric cotangent of an angle expressed in radians. Counts the number of rows. +## CUME_DIST + +`CUME_DIST()` + +**Function type:** [Window](sql-window-functions.md#window-function-reference) + +Returns the cumulative distribution of the current row within the window calculated as `number of window rows at the same rank or higher than current row` / `total window rows`. Review Comment: Should we indicate if the value ranges between 1/#rows and 1 similar to what postgres does ? ########## docs/querying/sql-window-functions.md: ########## @@ -88,40 +88,171 @@ You can use the OVER clause to treat other Druid aggregation functions as window Window functions support aliasing. -## Define a window with the OVER clause +## Window function syntax + +In general, Druid Window functions use the following syntax: + +```sql +SELECT + dimensions, + aggregation function(s) + window_function() + OVER ( PARTITION BY partitioning expression Review Comment: [PARTITION BY ...] [ORDER BY ...] as these can be optional ########## docs/querying/sql-functions.md: ########## @@ -1124,6 +1180,14 @@ Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON val Converts a string into a BIGINT with the given base or into a DECIMAL data type if the base is not specified. +## PERCENT_RANK + +`PERCENT_RANK()` + +**Function type:** [Window](sql-window-functions.md#window-function-reference) + +Returns the rank of the row calculated as a percentage according to the formula: `(rank - 1) / (total window rows - 1)`. Review Comment: nit. Returns relative rank of the current row ########## docs/querying/sql-functions.md: ########## @@ -1148,6 +1212,14 @@ Calculates a numerical expression raised to the specified power. Converts an angle from degrees to radians. +## RANK + +`RANK()` + +**Function type:** [Window](sql-window-functions.md#window-function-reference) Review Comment: Should specify with gaps. @kgyrtkirk please chime in ########## docs/querying/sql-functions.md: ########## @@ -1092,6 +1140,14 @@ Returns a slice of the array from the zero-based start and end indexes. Joins all elements of the array together by the given delimiter. +## NTILE + +`NTILE(tiles)` + +**Function type:** [Window](sql-window-functions.md#window-function-reference) + +Divides the rows within a window as evenly as possible into the number of tiles, also called buckets, and returns the value of the tile that the row falls into. Review Comment: I'm a bit confused by this. Say A window size is 10 and tiles is 2. This wording indicates that it divides the window into 2 buckets but it ideally should mean divide the 10 into 5 buckets with 2 items in each. @kgyrtkirk can you please chime in ########## docs/querying/sql-functions.md: ########## @@ -1213,6 +1285,14 @@ Returns the rightmost number of characters from an expression. Calculates the rounded value for a numerical expression. +## ROW_NUMBER + +`ROW_NUMBER()` + +**Function type:** [Window](sql-window-functions.md#window-function-reference) + +Returns the number of the row within the window. Review Comment: We should mention counting from 1 as postgres does. ########## docs/querying/sql-functions.md: ########## @@ -860,6 +884,22 @@ Extracts an `ARRAY<COMPLEX<json>>` value from `expr` at the specified `path`. If Extracts a literal value from `expr` at the specified `path`. If you specify `RETURNING` and an SQL type name (such as `VARCHAR`, `BIGINT`, `DOUBLE`, etc) the function plans the query using the suggested type. Otherwise, it attempts to infer the type based on the context. If it can't infer the type, it defaults to `VARCHAR`. +## LAG + +`LAG(expr[, offset])` + +**Function type:** [Window](sql-window-functions.md#window-function-reference) + +Returns the value evaluated at the row that precedes the current row by the offset number within the window. `offset` defaults to 1 if not provided. Review Comment: Maybe make it a bit clear like `returns the value evaluated at the row which is offset rows preceding the current row within the window` ########## docs/querying/sql-window-functions.md: ########## @@ -88,40 +88,171 @@ You can use the OVER clause to treat other Druid aggregation functions as window Window functions support aliasing. -## Define a window with the OVER clause +## Window function syntax + +In general, Druid Window functions use the following syntax: + +```sql +SELECT + dimensions, + aggregation function(s) + window_function() + OVER ( PARTITION BY partitioning expression + ORDER BY order expression + frame clause + ) + FROM table + GROUP BY dimensions +``` The OVER clause defines the query windows for window functions as follows: - PARTITION BY indicates the dimension that defines the rows within the window - ORDER BY specifies the order of the rows within the windows. +for example, the following OVER clause example sets the window dimension to `channel` and orders the results by the absolute value of `delta` ascending: + +```sql +... +RANK() OVER (PARTITION BY channel ORDER BY ABS(delta) ASC) +... +``` + +Druid applies the GROUP BY dimensions first before calculating all non-window aggregation functions. Then it applies the window function over the aggregate results. Review Comment: Should also specify what an empty OVER() indicates ########## docs/querying/sql-functions.md: ########## @@ -876,6 +916,14 @@ Returns the value of a numeric or string expression corresponding to the latest Returns the value of a numeric or string expression corresponding to the latest time value from `timestampExpr`. +## LEAD Review Comment: Is this sorted alphabetically, since LEAD and LAG are similar I expected them to be together but if alphabetical it's all good too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
