soumyava commented on code in PR #15902:
URL: https://github.com/apache/druid/pull/15902#discussion_r1491393725
##########
docs/querying/sql-window-functions.md:
##########
@@ -88,40 +88,171 @@ You can use the OVER clause to treat other Druid
aggregation functions as window
Window functions support aliasing.
-## Define a window with the OVER clause
+## Window function syntax
+
+In general, Druid Window functions use the following syntax:
+
+```sql
+SELECT
+ dimensions,
+ aggregation function(s)
+ window_function()
+ OVER ( PARTITION BY partitioning expression
+ ORDER BY order expression
+ frame clause
+ )
+ FROM table
+ GROUP BY dimensions
+```
The OVER clause defines the query windows for window functions as follows:
- PARTITION BY indicates the dimension that defines the rows within the window
- ORDER BY specifies the order of the rows within the windows.
+for example, the following OVER clause example sets the window dimension to
`channel` and orders the results by the absolute value of `delta` ascending:
+
+```sql
+...
+RANK() OVER (PARTITION BY channel ORDER BY ABS(delta) ASC)
+...
+```
+
+Druid applies the GROUP BY dimensions first before calculating all non-window
aggregation functions. Then it applies the window function over the aggregate
results.
+
:::note
Sometimes windows are called partitions. However, the partitioning for window
functions are a shuffle (partition) of the result set created at query time and
is not to be confused with Druid's segment partitioning feature which
partitions data at ingest time.
:::
-The following OVER clause example sets the window dimension to `channel` and
orders the results by the absolute value of `delta` ascending:
+### ORDER BY windows
+
+When the window definition only specifies ORDER BY , it sorts the aggregate
data set and applies the function in that order.
+
+The following query uses ORDER BY SUM(delta) DESC to rank user hourly activity
from the most changed the least changed within an hour:
```sql
-...
-RANK() OVER (PARTITION BY channel ORDER BY ABS(delta) ASC)
-...
+SELECT
+ TIME_FLOOR(__time, 'PT1H') as time_hour,
+ channel,
+ user,
+ SUM(delta) net_user_changes,
+ RANK( ) OVER ( ORDER BY SUM(delta) DESC ) editing_rank
+FROM "wikipedia"
+WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
+ AND __time BETWEEN '2016-06-27' AND '2016-06-28'
+GROUP BY TIME_FLOOR(__time, 'PT1H'), channel, user
+ORDER BY 5
+```
+
+### PARTITION BY windows
+
+When a window only specifies PARTITION BY partition expression, Druid
calculates the aggregate window function over all the rows that share a values
within the selected dataset.
+
+The following example demonstrates a query that uses two different windows
PARTITION BY channel and PARTITION BY user to calculate the total activity in
the channel and total activity by the user so that they can be compared to
individual hourly activity:
+
+```sql
+SELECT
+ TIME_FLOOR(__time, 'PT1H') as time_hour, channel, user,
+ SUM(delta) hourly_user_changes,
+ SUM(SUM(delta)) OVER (PARTITION BY user ) AS total_user_changes,
+ SUM(SUM(delta)) OVER (PARTITION BY channel ) AS total_channel_changes
+FROM "wikipedia"
+WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
+ AND __time BETWEEN '2016-06-27' AND '2016-06-28'
+GROUP BY TIME_FLOOR(__time, 'PT1H'),2,3
+ORDER BY channel,TIME_FLOOR(__time, 'PT1H'), user
+```
+
+The windows only define the PARTITION BY clause of the window, so Druid
performs the calculation over the whole dataset for each value of the partition
expression.
Review Comment:
This seems to repeat with the earlier line 150
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]