vtlim commented on code in PR #15902: URL: https://github.com/apache/druid/pull/15902#discussion_r1516891981
########## docs/querying/sql-window-functions.md: ########## @@ -1,6 +1,7 @@ --- id: sql-window-functions title: Window functions +description: Reference for Review Comment: ```suggestion description: Reference for window functions ``` ########## docs/querying/sql-window-functions.md: ########## @@ -88,40 +89,181 @@ You can use the OVER clause to treat other Druid aggregation functions as window Window functions support aliasing. -## Define a window with the OVER clause +## Window function syntax + +In general, window functions in Druid use the following syntax: + + +```sql +window_function() OVER ( + [PARTITION BY partitioning expression] + [ORDER BY order expression] + [[ROWS, RANGE] BETWEEN range start AND range end]) +FROM table +GROUP BY dimensions +``` + +```sql +window_function() OVER w +FROM table +WINDOW w AS ([PARTITION BY partitioning expression] [ORDER BY order expression] + [[ROWS, RANGE] BETWEEN range start AND range end]) +GROUP BY dimensions +``` The OVER clause defines the query windows for window functions as follows: -- PARTITION BY indicates the dimension that defines the rows within the window -- ORDER BY specifies the order of the rows within the windows. +- PARTITION BY indicates the dimension that defines window boundaries +- ORDER BY specifies the order of the rows within the windows + +An empty OVER clause or the absence of a PARTITION BY clause indicates that all data belongs to a single window. + +In the following example, the following OVER clause example sets the window dimension to `channel` and orders the results by the absolute value of `delta` ascending: + +```sql +... +RANK() OVER (PARTITION BY channel ORDER BY ABS(delta) ASC) +... +``` + +Window frames, set in ROWS and RANGE expressions, limit the set of rows used for the windowed aggregation. + +ROWS and RANGE accept the following values for `range start` and `range end`: +- UNBOUND PRECEDING: from the beginning of the window as ordered by the order expression +- _N_ ROWS PRECEDING: _N_ rows before the current row as ordered by the order expression +- CURRENT ROW: the current row +- _N_ ROWS FOLLOWING: _N_ rows after the current row as ordered by the order expression +- _UNBOUNDED FOLLOWING_: to the end of the window as ordered by the order expression Review Comment: Note that this one says UNBOUNDED but line 131 says UNBOUND. Should they be the same or is this intentional? ########## docs/querying/sql-window-functions.md: ########## @@ -88,40 +89,181 @@ You can use the OVER clause to treat other Druid aggregation functions as window Window functions support aliasing. -## Define a window with the OVER clause +## Window function syntax + +In general, window functions in Druid use the following syntax: + + +```sql +window_function() OVER ( + [PARTITION BY partitioning expression] + [ORDER BY order expression] + [[ROWS, RANGE] BETWEEN range start AND range end]) +FROM table +GROUP BY dimensions +``` + +```sql +window_function() OVER w +FROM table +WINDOW w AS ([PARTITION BY partitioning expression] [ORDER BY order expression] + [[ROWS, RANGE] BETWEEN range start AND range end]) +GROUP BY dimensions +``` The OVER clause defines the query windows for window functions as follows: -- PARTITION BY indicates the dimension that defines the rows within the window -- ORDER BY specifies the order of the rows within the windows. +- PARTITION BY indicates the dimension that defines window boundaries +- ORDER BY specifies the order of the rows within the windows + +An empty OVER clause or the absence of a PARTITION BY clause indicates that all data belongs to a single window. + +In the following example, the following OVER clause example sets the window dimension to `channel` and orders the results by the absolute value of `delta` ascending: + +```sql +... +RANK() OVER (PARTITION BY channel ORDER BY ABS(delta) ASC) +... +``` + +Window frames, set in ROWS and RANGE expressions, limit the set of rows used for the windowed aggregation. + +ROWS and RANGE accept the following values for `range start` and `range end`: +- UNBOUND PRECEDING: from the beginning of the window as ordered by the order expression +- _N_ ROWS PRECEDING: _N_ rows before the current row as ordered by the order expression +- CURRENT ROW: the current row +- _N_ ROWS FOLLOWING: _N_ rows after the current row as ordered by the order expression +- _UNBOUNDED FOLLOWING_: to the end of the window as ordered by the order expression Review Comment: ```suggestion - UNBOUNDED FOLLOWING: to the end of the window as ordered by the order expression ``` ########## docs/querying/sql-window-functions.md: ########## @@ -338,15 +385,45 @@ WINDOW cumulative AS ( ) ``` +<details> +<summary> View results </summary> + +| channel | time_hour | hourly_channel_changes | cumulative_activity_in_channel | csum5 | count5 | +| --- | --- | --- | --- | --- | --- | --- | --- | +| #en.wikipedia | 2016-06-27T00:00:00.000Z | 74996 | 74996 | 74996 | 1 | +| #en.wikipedia | 2016-06-27T01:00:00.000Z | 24150 | 99146 | 99146 | 2 | +| #en.wikipedia | 2016-06-27T02:00:00.000Z | 102372 | 201518 | 201518 | 3 | +| #en.wikipedia | 2016-06-27T03:00:00.000Z | 61362 | 262880 | 262880 | 4 | +| #en.wikipedia | 2016-06-27T04:00:00.000Z | 61666 | 324546 | 324546 | 5 | +| #en.wikipedia | 2016-06-27T05:00:00.000Z | 144199 | 468745 | 393749 | 5 | +| #en.wikipedia | 2016-06-27T06:00:00.000Z | 33414 | 502159 | 403013 | 5 | +| #en.wikipedia | 2016-06-27T07:00:00.000Z | 79397 | 581556 | 380038 | 5 | +| #en.wikipedia | 2016-06-27T08:00:00.000Z | 104436 | 685992 | 423112 | 5 | +| #en.wikipedia | 2016-06-27T09:00:00.000Z | 58020 | 744012 | 419466 | 5 | +| #en.wikipedia | 2016-06-27T10:00:00.000Z | 93904 | 837916 | 369171 | 5 | +| #en.wikipedia | 2016-06-27T11:00:00.000Z | 74436 | 912352 | 410193 | 5 | +| #en.wikipedia | 2016-06-27T12:00:00.000Z | 83491 | 995843 | 414287 | 5 | +| #en.wikipedia | 2016-06-27T13:00:00.000Z | 103051 | 1098894 | 412902 | 5 | +| #en.wikipedia | 2016-06-27T14:00:00.000Z | 211411 | 1310305 | 566293 | 5 | +| #en.wikipedia | 2016-06-27T15:00:00.000Z | 101247 | 1411552 | 573636 | 5 | +| #en.wikipedia | 2016-06-27T16:00:00.000Z | 189765 | 1601317 | 688965 | 5 | +| #en.wikipedia | 2016-06-27T17:00:00.000Z | 74404 | 1675721 | 679878 | 5 | +| #en.wikipedia | 2016-06-27T18:00:00.000Z | 104824 | 1780545 | 681651 | 5 | +| #en.wikipedia | 2016-06-27T19:00:00.000Z | 71268 | 1851813 | 541508 | 5 | +| #en.wikipedia | 2016-06-27T20:00:00.000Z | 88185 | 1939998 | 528446 | 5 | +| #en.wikipedia | 2016-06-27T21:00:00.000Z | 42584 | 1982582 | 381265 | 5 | + +</details> + The example defines multiple window specifications in the WINDOW clause that you can use for various window function calculations. The query uses two windows: -- cumulative is partitioned by channel and includes all rows from the beginning of partition up to the current row as ordered by `__time` to enable cumulative aggregation -- `moving5` is also partitioned by channel but only includes up to the last 4 rows and the current row as ordered by time +- `cumulative` is partitioned by channel and includes all rows from the beginning of partition up to the current row as ordered by `__time` to enable cumulative aggregation +- `moving5` is also partitioned by channel but only includes up to the last four rows and the current row as ordered by time -The number of rows considered for the `moving5` window for the count 5 column: -- starts at 1 because there are no rows before the current one -- grows up to 5 as defined by ROWS BETWEEN 4 PRECEDING AND CURRENT ROW +The number of rows considered for the `moving5` window for the `count5` column: +- starts at a single row because there are no rows before the current one +- grows up to five rows as defined by `ROWS BETWEEN 4 PRECEDING AND CURRENT ROW` Review Comment: ```suggestion - grows up to five rows as defined by `ROWS BETWEEN 4 ROWS PRECEDING AND CURRENT ROW` ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org