This is an automated email from the ASF dual-hosted git repository.
victoria pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new 3caacba8c52 update window functions doc (#15902)
3caacba8c52 is described below
commit 3caacba8c5297eac767f24e247aebe0483cfced5
Author: Charles Smith <[email protected]>
AuthorDate: Thu Mar 7 15:16:52 2024 -0800
update window functions doc (#15902)
Co-authored-by: Victoria Lim <[email protected]>
---
docs/querying/sql-functions.md | 80 +++++++++++
docs/querying/sql-translation.md | 1 -
docs/querying/sql-window-functions.md | 250 +++++++++++++++++++++++++++++++---
website/.spelling | 6 +
website/sidebars.json | 2 +-
5 files changed, 318 insertions(+), 21 deletions(-)
diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md
index 2d0c51f6c12..9a43afab1d7 100644
--- a/docs/querying/sql-functions.md
+++ b/docs/querying/sql-functions.md
@@ -480,6 +480,14 @@ Calculates the trigonometric cotangent of an angle
expressed in radians.
Counts the number of rows.
+## CUME_DIST
+
+`CUME_DIST()`
+
+**Function type:** [Window](sql-window-functions.md#window-function-reference)
+
+Returns the cumulative distribution of the current row within the window
calculated as `number of window rows at the same rank or higher than current
row` / `total window rows`. The return value ranges between `1/number of rows`
and 1.
+
## CURRENT_DATE
`CURRENT_DATE`
@@ -529,6 +537,14 @@ Decodes a Base64-encoded string into a UTF-8 encoded
string.
Converts an angle from radians to degrees.
+## DENSE_RANK
+
+`DENSE_RANK()`
+
+**Function type:** [Window](sql-window-functions.md#window-function-reference)
+
+Returns the rank for a row within a window without gaps. For example, if two
rows tie for a rank of 1, the subsequent row is ranked 2.
+
## DIV
`DIV(x, y)`
@@ -684,6 +700,14 @@ Calculates _e_ raised to the power of the numeric
expression.
Extracts the value of some unit of the timestamp, optionally from a certain
time zone, and returns the number.
+## FIRST_VALUE
+
+`FIRST_VALUE(expr)`
+
+**Function type:** [Window](sql-window-functions.md#window-function-reference)
+
+Returns the value evaluated for the expression for the first row within the
window.
+
## FLOOR (date and time)
`FLOOR(<TIMESTAMP> TO <TIME_UNIT>)`
@@ -860,6 +884,22 @@ Extracts an `ARRAY<COMPLEX<json>>` value from `expr` at
the specified `path`. If
Extracts a literal value from `expr` at the specified `path`. If you specify
`RETURNING` and an SQL type name (such as `VARCHAR`, `BIGINT`, `DOUBLE`, etc)
the function plans the query using the suggested type. Otherwise, it attempts
to infer the type based on the context. If it can't infer the type, it defaults
to `VARCHAR`.
+## LAG
+
+`LAG(expr[, offset])`
+
+**Function type:** [Window](sql-window-functions.md#window-function-reference)
+
+If you do not supply an `offset`, returns the value evaluated at the row
preceding the current row. Specify an offset number `n` to return the value
evaluated at `n` rows preceding the current one.
+
+## LAST_VALUE
+
+`LAST_VALUE(expr)`
+
+**Function type:** [Window](sql-window-functions.md#window-function-reference)
+
+Returns the value evaluated for the expression for the last row within the
window.
+
## LATEST
`LATEST(expr, [maxBytesPerValue])`
@@ -876,6 +916,14 @@ Returns the value of a numeric or string expression
corresponding to the latest
Returns the value of a numeric or string expression corresponding to the
latest time value from `timestampExpr`.
+## LEAD
+
+`LEAD(expr[, offset])`
+
+**Function type:** [Window](sql-window-functions.md#window-function-reference)
+
+If you do not supply an `offset`, returns the value evaluated at the row
following the current row. Specify an offset number `n` to return the value
evaluated at `n` rows following the current one; if there is no such row,
returns the given default value.
+
## LEAST
`LEAST([expr1, ...])`
@@ -1092,6 +1140,14 @@ Returns a slice of the array from the zero-based start
and end indexes.
Joins all elements of the array together by the given delimiter.
+## NTILE
+
+`NTILE(tiles)`
+
+**Function type:** [Window](sql-window-functions.md#window-function-reference)
+
+Divides the rows within a window as evenly as possible into the number of
tiles, also called buckets, and returns the value of the tile that the row
falls into.
+
## NULLIF
`NULLIF(value1, value2)`
@@ -1124,6 +1180,14 @@ Parses `expr` into a `COMPLEX<json>` object. This
operator deserializes JSON val
Converts a string into a BIGINT with the given base or into a DECIMAL data
type if the base is not specified.
+## PERCENT_RANK
+
+`PERCENT_RANK()`
+
+**Function type:** [Window](sql-window-functions.md#window-function-reference)
+
+Returns the relative rank of the row calculated as a percentage according to
the formula: `RANK() OVER (window) / COUNT(1) OVER (window)`.
+
## POSITION
`POSITION(<CHARACTER> IN <CHARACTER> [FROM <INTEGER>])`
@@ -1148,6 +1212,14 @@ Calculates a numerical expression raised to the
specified power.
Converts an angle from degrees to radians.
+## RANK
+
+`RANK()`
+
+**Function type:** [Window](sql-window-functions.md#window-function-reference)
+
+Returns the rank with gaps for a row within a window. For example, if two rows
tie for rank 1, the next rank is 3.
+
## REGEXP_EXTRACT
`REGEXP_EXTRACT(<CHARACTER>, <CHARACTER>, [<INTEGER>])`
@@ -1213,6 +1285,14 @@ Returns the rightmost number of characters from an
expression.
Calculates the rounded value for a numerical expression.
+## ROW_NUMBER
+
+`ROW_NUMBER()`
+
+**Function type:** [Window](sql-window-functions.md#window-function-reference)
+
+Returns the number of the row within the window starting from 1.
+
## RPAD
`RPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
diff --git a/docs/querying/sql-translation.md b/docs/querying/sql-translation.md
index 29edc932531..63ef8fdbdb3 100644
--- a/docs/querying/sql-translation.md
+++ b/docs/querying/sql-translation.md
@@ -822,7 +822,6 @@ Druid does not support all SQL features. In particular, the
following features a
- JOIN conditions that are not an equality between expressions from the left-
and right-hand sides.
- JOIN conditions containing a constant value inside the condition.
- JOIN conditions on a column which contains a multi-value dimension.
-- OVER clauses, and analytic functions such as `LAG` and `LEAD`.
- ORDER BY for a non-aggregating query, except for `ORDER BY __time` or `ORDER
BY __time DESC`, which are supported.
This restriction only applies to non-aggregating queries; you can ORDER BY
any column in an aggregating query.
- DDL and DML.
diff --git a/docs/querying/sql-window-functions.md
b/docs/querying/sql-window-functions.md
index 4624ef07715..d64538779f0 100644
--- a/docs/querying/sql-window-functions.md
+++ b/docs/querying/sql-window-functions.md
@@ -1,6 +1,7 @@
---
id: sql-window-functions
title: Window functions
+description: Reference for window functions
---
<!--
@@ -38,10 +39,10 @@ Set the context parameter `enableWindowing: true` to use
window functions.
Window functions in Apache Druid produce values based upon the relationship of
one row within a window of rows to the other rows within the same window. A
window is a group of related rows within a result set. For example, rows with
the same value for a specific dimension.
-The following example organizes results with the same `channel` value into
windows. For each window, the query returns the rank of each row in ascending
order based upon its `delta` value.
-
Window functions in Druid require a GROUP BY statement. Druid performs the
row-level aggregations for the GROUP BY before performing the window function
calculations.
+The following example organizes results with the same `channel` value into
windows. For each window, the query returns the rank of each row in ascending
order based upon its `changed` value.
+
```sql
SELECT FLOOR(__time TO DAY) AS event_time,
channel,
@@ -88,11 +89,55 @@ You can use the OVER clause to treat other Druid
aggregation functions as window
Window functions support aliasing.
-## Define a window with the OVER clause
+## Window function syntax
+
+You can write a window function in Druid using either syntax below.
+The second syntax shows a window alias to reference a window that you can
reuse.
+
+
+```sql
+window_function() OVER (
+ [PARTITION BY partitioning expression]
+ [ORDER BY order expression]
+ [[ROWS, RANGE] BETWEEN range start AND range end])
+FROM table
+GROUP BY dimensions
+```
+
+```sql
+window_function() OVER w
+FROM table
+WINDOW w AS ([PARTITION BY partitioning expression] [ORDER BY order expression]
+ [[ROWS, RANGE] BETWEEN range start AND range end])
+GROUP BY dimensions
+```
The OVER clause defines the query windows for window functions as follows:
-- PARTITION BY indicates the dimension that defines the rows within the window
-- ORDER BY specifies the order of the rows within the windows.
+- PARTITION BY indicates the dimension that defines window boundaries
+- ORDER BY specifies the order of the rows within the windows
+
+An empty OVER clause or the absence of a PARTITION BY clause indicates that
all data belongs to a single window.
+
+In the following example, the following OVER clause example sets the window
dimension to `channel` and orders the results by the absolute value of `delta`
ascending:
+
+```sql
+...
+RANK() OVER (PARTITION BY channel ORDER BY ABS(delta) ASC)
+...
+```
+
+Window frames, set in ROWS and RANGE expressions, limit the set of rows used
for the windowed aggregation.
+
+ROWS and RANGE accept the following values for `range start` and `range end`:
+- UNBOUNDED PRECEDING: from the beginning of the window as ordered by the
order expression
+- _N_ ROWS PRECEDING: _N_ rows before the current row as ordered by the order
expression
+- CURRENT ROW: the current row
+- _N_ ROWS FOLLOWING: _N_ rows after the current row as ordered by the order
expression
+- UNBOUNDED FOLLOWING: to the end of the window as ordered by the order
expression
+
+See [Example with window frames](#example-with-window-frames) for more detail.
+
+Druid applies the GROUP BY dimensions before calculating all non-window
aggregation functions. Then it applies the window function over the aggregated
results.
:::note
@@ -100,28 +145,126 @@ Sometimes windows are called partitions. However, the
partitioning for window fu
:::
-The following OVER clause example sets the window dimension to `channel` and
orders the results by the absolute value of `delta` ascending:
+### ORDER BY windows
+
+When the window definition only specifies ORDER BY and not PARTITION BY, it
sorts the aggregate data set and applies the function in that order.
+
+The following query uses `ORDER BY SUM(delta) DESC` to rank user hourly
activity from the most changed the least changed within an hour:
```sql
-...
-RANK() OVER (PARTITION BY channel ORDER BY ABS(delta) ASC)
-...
+SELECT
+ TIME_FLOOR(__time, 'PT1H') as time_hour,
+ channel,
+ user,
+ SUM(delta) net_user_changes,
+ RANK() OVER (ORDER BY SUM(delta) DESC) AS editing_rank
+FROM "wikipedia"
+WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
+ AND __time BETWEEN '2016-06-27' AND '2016-06-28'
+GROUP BY TIME_FLOOR(__time, 'PT1H'), channel, user
+ORDER BY 5
```
+<details>
+<summary> View results </summary>
+
+| `time_hour` | `channel` | `user` | `net_user_changes` | `editing_rank` |
+| --- | --- | --- | --- | --- |
+| `2016-06-27T15:00:00.000Z` | `#kk.wikipedia` | `Nurkhan` | 6900| 1 |
+| `2016-06-27T19:00:00.000Z` | `#lt.wikipedia` | `77.221.66.41` | 4358 | 2 |
+| `2016-06-27T09:00:00.000Z` | `#kk.wikipedia` | `Салиха` | 2702 | 3 |
+| `2016-06-27T04:00:00.000Z` | `#kk.wikipedia` | `Nurkhan` | 2440 | 4 |
+| `2016-06-27T09:00:00.000Z` | `#lt.wikipedia` | `80.4.147.222` | 894 | 5 |
+| `2016-06-27T09:00:00.000Z` | `#lt.wikipedia` | `178.11.203.212` | 447 | 6 |
+| `2016-06-27T11:00:00.000Z` | `#kk.wikipedia` | `Нұрлан Рахымжанов` | 126 | 7
|
+| `2016-06-27T06:00:00.000Z` | `#kk.wikipedia` | `Шокай` | 91 | 8 |
+| `2016-06-27T11:00:00.000Z` | `#lt.wikipedia` | `MaryroseB54` | 59 | 9 |
+| `2016-06-27T04:00:00.000Z` | `#kk.wikipedia` | `Нұрлан Рахымжанов` | 56 | 10
|
+| `2016-06-27T12:00:00.000Z` | `#lt.wikipedia` | `Karoliuk` | 53 | 11 |
+| `2016-06-27T12:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | 28 | 12 |
+| `2016-06-27T07:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | 13 | 13 |
+| `2016-06-27T10:00:00.000Z` | `#lt.wikipedia` | `80.4.147.222` | 1 | 14 |
+| `2016-06-27T07:00:00.000Z` | `#kk.wikipedia` | `Салиха` | -1 | 15 |
+| `2016-06-27T06:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | -2 | 16 |
+</details>
+
+### PARTITION BY windows
+
+When a window only specifies PARTITION BY partition expression, Druid
calculates the aggregate window function over all the rows that share a value
within the selected dataset.
+
+The following example demonstrates a query that uses two different
windows—`PARTITION BY channel` and `PARTITION BY user`—to calculate the total
activity in the channel and total activity by the user so that they can be
compared to individual hourly activity:
+
+```sql
+SELECT
+ TIME_FLOOR(__time, 'PT1H') as time_hour,
+ channel,
+ user,
+ SUM(delta) AS hourly_user_changes,
+ SUM(SUM(delta)) OVER (PARTITION BY user) AS total_user_changes,
+ SUM(SUM(delta)) OVER (PARTITION BY channel) AS total_channel_changes
+FROM "wikipedia"
+WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
+ AND __time BETWEEN '2016-06-27' AND '2016-06-28'
+GROUP BY TIME_FLOOR(__time, 'PT1H'), 2, 3
+ORDER BY channel, TIME_FLOOR(__time, 'PT1H'), user
+```
+
+<details>
+<summary> View results </summary>
+
+| `time_hour` | `channel` | `user` | `hourly_user_changes` |
`total_user_changes` | `total_channel_changes` |
+| --- | ---| ---| --- | --- | --- |
+| `2016-06-27T04:00:00.000Z` | `#kk.wikipedia` | `Nurkhan` | 2440 | 9340 |
12314 |
+| `2016-06-27T04:00:00.000Z` | `#kk.wikipedia` | `Нұрлан Рахымжанов` | 56 |
182 | 12314 |
+| `2016-06-27T06:00:00.000Z` | `#kk.wikipedia` | `Шокай` | 91 | 91 | 12314 |
+| `2016-06-27T07:00:00.000Z` | `#kk.wikipedia` | `Салиха` | -1 | 2701 | 12314 |
+| `2016-06-27T09:00:00.000Z` | `#kk.wikipedia` | `Салиха` | 2702 | 2701 |
12314 |
+| `2016-06-27T11:00:00.000Z` | `#kk.wikipedia` | `Нұрлан Рахымжанов` | 126 |
182 | 12314 |
+| `2016-06-27T15:00:00.000Z` | `#kk.wikipedia` | `Nurkhan` | 6900 | 9340 |
12314 |
+| `2016-06-27T06:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | -2 | 39 | 5851
|
+| `2016-06-27T07:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | 13 | 39 | 5851
|
+| `2016-06-27T09:00:00.000Z` | `#lt.wikipedia` | `178.11.203.212` | 447 | 447
| 5851 |
+| `2016-06-27T09:00:00.000Z` | `#lt.wikipedia` | `80.4.147.222` | 894 | 895 |
5851 |
+| `2016-06-27T10:00:00.000Z` | `#lt.wikipedia` | `80.4.147.222` | 1 | 895 |
5851 |
+| `2016-06-27T11:00:00.000Z` | `#lt.wikipedia` | `MaryroseB54` | 59 | 59 |
5851 |
+| `2016-06-27T12:00:00.000Z` | `#lt.wikipedia` | `Karoliuk` | 53 | 53 | 5851 |
+| `2016-06-27T12:00:00.000Z` | `#lt.wikipedia` | `Powermelon` | 28 | 39 | 5851
|
+| `2016-06-27T19:00:00.000Z` | `#lt.wikipedia` | `77.221.66.41` | 4358 | 4358
| 5851 |
+
+</details>
+
+In this example, the dataset is filtered for a single day. Therefore the
window function results represent the total activity for the day, for the
`user` and for the `channel` dimensions respectively.
+
+This type of result helps you analyze the impact of an individual user's
hourly activity:
+- the impact to the channel by comparing `hourly_user_changes` to
`total_channel_changes`
+- the impact of each user over the channel by `total_user_changes` to
`total_channel_changes`
+- the progress of each user's individual activity by comparing
`hourly_user_changes` to `total_user_changes`
+
+#### Window frame guardrails
+
+Druid has guardrail logic to prevent you from executing window function
queries with window frame expressions that might return unexpected results.
+
+For example:
+- You cannot set expressions as bounds for window frames.
+- You cannot use two FOLLOWING expressions in the window frame. For example:
`ROWS BETWEEN 2 ROWS FOLLOWING and 3 ROWS FOLLOWING`.
+- You can only use a RANGE frames when both endpoints are unbounded or current
row.
+
+If you write a query that violates one of these conditions, Druid throws an
error: "The query contains a window frame which may return incorrect results.
To disregard this warning, set `windowingStrictValidation` to false in the
query context."
+
## Window function reference
|Function|Notes|
|--------|-----|
-| `ROW_NUMBER()`| Returns the number of the row within the window |
-|`RANK()`| Returns the rank for a row within a window |
-|`DENSE_RANK()`| Returns the rank for a row within a window without gaps. For
example, if two rows tie for rank of 1, the subsequent row is ranked 2. |
-|`PERCENT_RANK()`| Returns the rank of the row calculated as a percentage
according to the formula: `(rank - 1) / (total window rows - 1)` |
-|`CUME_DIST()`| Returns the cumulative distribution of the current row within
the window calculated as `number of window rows at the same rank or higher than
current row` / `total window rows` |
-|`NTILE(tiles)`| Divides the rows within a window as evenly as possible into
the number of tiles, also called buckets, and returns the value of the tile
that the row falls into | None |
-|`LAG(expr[, offset])`| Returns the value evaluated at the row that precedes
the current row by the offset number within the window. `offset` defaults to 1
if not provided |
-|`LEAD(expr[, offset])`| Returns the value evaluated at the row that follows
the current row by the offset number within the window; if there is no such
row, returns the given default value. `offset` defaults to 1 if not provided |
-|`FIRST_VALUE(expr)`| Returns the value for the expression for the first row
within the window |
-|`LAST_VALUE(expr)`| Returns the value for the expression for the last row
within the window |
+| `ROW_NUMBER()` | Returns the number of the row within the window starting
from 1 |
+| `RANK()` | Returns the rank with gaps for a row within a window. For
example, if two rows tie for rank 1, the next rank is 3 |
+| `DENSE_RANK()` | Returns the rank for a row within a window without gaps.
For example, if two rows tie for rank of 1, the subsequent row is ranked 2. |
+| `PERCENT_RANK()` | Returns the relative rank of the row calculated as a
percentage according to the formula: `RANK() OVER (window) / COUNT(1) OVER
(window)` |
+| `CUME_DIST()` | Returns the cumulative distribution of the current row
within the window calculated as number of window rows at the same rank or
higher than current row divided by total window rows. The return value ranges
between `1/number of rows` and 1 |
+| `NTILE(tiles)` | Divides the rows within a window as evenly as possible into
the number of tiles, also called buckets, and returns the value of the tile
that the row falls into | None |
+| `LAG(expr[, offset])` | If you do not supply an `offset`, returns the value
evaluated at the row preceding the current row. Specify an offset number, `n`,
to return the value evaluated at `n` rows preceding the current one |
+| `LEAD(expr[, offset])` | If you do not supply an `offset`, returns the value
evaluated at the row following the current row. Specify an offset number `n` to
return the value evaluated at `n` rows following the current one; if there is
no such row, returns the given default value |
+| `FIRST_VALUE(expr)` | Returns the value evaluated for the expression for the
first row within the window |
+| `LAST_VALUE(expr)` | Returns the value evaluated for the expression for the
last row within the window |
## Examples
@@ -214,6 +357,75 @@ GROUP BY channel, __time, delta
</details>
+### Example with window frames
+
+The following query uses a few different window frames to calculate overall
activity by channel:
+
+```sql
+SELECT
+ channel,
+ TIME_FLOOR(__time, 'PT1H') AS time_hour,
+ SUM(delta) AS hourly_channel_changes,
+ SUM(SUM(delta)) OVER cumulative AS cumulative_activity_in_channel,
+ SUM(SUM(delta)) OVER moving5 AS csum5,
+ COUNT(1) OVER moving5 AS count5
+FROM "wikipedia"
+WHERE channel = '#en.wikipedia'
+ AND __time BETWEEN '2016-06-27' AND '2016-06-28'
+GROUP BY 1, TIME_FLOOR(__time, 'PT1H')
+WINDOW cumulative AS (
+ PARTITION BY channel
+ ORDER BY TIME_FLOOR(__time, 'PT1H')
+ ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+ )
+ ,
+ moving5 AS (
+ PARTITION BY channel
+ ORDER BY TIME_FLOOR(__time, 'PT1H')
+ ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
+ )
+```
+
+<details>
+<summary> View results </summary>
+
+| `channel` | `time_hour` | `hourly_channel_changes` |
`cumulative_activity_in_channel` | `csum5` | `count5` |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| `#en.wikipedia` | `2016-06-27T00:00:00.000Z` | 74996 | 74996 | 74996 | 1 |
+| `#en.wikipedia` | `2016-06-27T01:00:00.000Z` | 24150 | 99146 | 99146 | 2 |
+| `#en.wikipedia` | `2016-06-27T02:00:00.000Z` | 102372 | 201518 | 201518 | 3 |
+| `#en.wikipedia` | `2016-06-27T03:00:00.000Z` | 61362 | 262880 | 262880 | 4 |
+| `#en.wikipedia` | `2016-06-27T04:00:00.000Z` | 61666 | 324546 | 324546 | 5 |
+| `#en.wikipedia` | `2016-06-27T05:00:00.000Z` | 144199 | 468745 | 393749 | 5 |
+| `#en.wikipedia` | `2016-06-27T06:00:00.000Z` | 33414 | 502159 | 403013 | 5 |
+| `#en.wikipedia` | `2016-06-27T07:00:00.000Z` | 79397 | 581556 | 380038 | 5 |
+| `#en.wikipedia` | `2016-06-27T08:00:00.000Z` | 104436 | 685992 | 423112 | 5 |
+| `#en.wikipedia` | `2016-06-27T09:00:00.000Z` | 58020 | 744012 | 419466 | 5 |
+| `#en.wikipedia` | `2016-06-27T10:00:00.000Z` | 93904 | 837916 | 369171 | 5 |
+| `#en.wikipedia` | `2016-06-27T11:00:00.000Z` | 74436 | 912352 | 410193 | 5 |
+| `#en.wikipedia` | `2016-06-27T12:00:00.000Z` | 83491 | 995843 | 414287 | 5 |
+| `#en.wikipedia` | `2016-06-27T13:00:00.000Z` | 103051 | 1098894 | 412902 | 5
|
+| `#en.wikipedia` | `2016-06-27T14:00:00.000Z` | 211411 | 1310305 | 566293 | 5
|
+| `#en.wikipedia` | `2016-06-27T15:00:00.000Z` | 101247 | 1411552 | 573636 | 5
|
+| `#en.wikipedia` | `2016-06-27T16:00:00.000Z` | 189765 | 1601317 | 688965 | 5
|
+| `#en.wikipedia` | `2016-06-27T17:00:00.000Z` | 74404 | 1675721 | 679878 | 5 |
+| `#en.wikipedia` | `2016-06-27T18:00:00.000Z` | 104824 | 1780545 | 681651 | 5
|
+| `#en.wikipedia` | `2016-06-27T19:00:00.000Z` | 71268 | 1851813 | 541508 | 5 |
+| `#en.wikipedia` | `2016-06-27T20:00:00.000Z` | 88185 | 1939998 | 528446 | 5 |
+| `#en.wikipedia` | `2016-06-27T21:00:00.000Z` | 42584 | 1982582 | 381265 | 5 |
+
+</details>
+
+The example defines multiple window specifications in the WINDOW clause that
you can use for various window function calculations.
+
+The query uses two windows:
+- `cumulative` is partitioned by channel and includes all rows from the
beginning of partition up to the current row as ordered by `__time` to enable
cumulative aggregation
+- `moving5` is also partitioned by channel but only includes up to the last
four rows and the current row as ordered by time
+
+The number of rows considered for the `moving5` window for the `count5` column:
+- starts at a single row because there are no rows before the current one
+- grows up to five rows as defined by `ROWS BETWEEN 4 ROWS PRECEDING AND
CURRENT ROW`
+
## Known issues
The following are known issues with window functions:
diff --git a/website/.spelling b/website/.spelling
index f26711c6d5f..86495fdf84e 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -57,7 +57,9 @@ CloudWatch
ColumnDescriptor
Corretto
CLI
+CUME_DIST
DDL
+DENSE_RANK
DML
DNS
DRUIDVERSION
@@ -94,6 +96,7 @@ ETL
Elasticsearch
Enums
FirehoseFactory
+FIRST_VALUE
FlattenSpec
Float.NEGATIVE_INFINITY
Float.NEGATIVE_INFINITY.
@@ -416,6 +419,7 @@ nonfinalized
non-null
non-nullable
noop
+NTILE
numerics
numShards
parameterize
@@ -427,6 +431,7 @@ partitioner
partitionFunction
partitionsSpec
pathParts
+PERCENT_RANK
performant
plaintext
pluggable
@@ -489,6 +494,7 @@ repo
requireSSL
rollup
rollups
+ROW_NUMBER
rsync
runtime
schemas
diff --git a/website/sidebars.json b/website/sidebars.json
index e3178726ce5..0b5bbc6e845 100644
--- a/website/sidebars.json
+++ b/website/sidebars.json
@@ -122,6 +122,7 @@
"label": "Druid SQL",
"items": [
"querying/sql",
+ "querying/sql-functions",
"querying/tips-good-queries",
"querying/query-deep-storage",
"querying/sql-data-types",
@@ -132,7 +133,6 @@
"querying/sql-array-functions",
"querying/sql-multivalue-string-functions",
"querying/sql-json-functions",
- "querying/sql-functions",
"querying/sql-query-context",
"querying/sql-metadata-tables",
"querying/sql-translation"
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]