[
https://issues.apache.org/jira/browse/CASSANDRA-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107407#comment-15107407
]
Brian Hess commented on CASSANDRA-9778:
----------------------------------------
[~blerer] - that really isn't an example of window functions (A/K/A window
aggregates, window aggregate functions, etc). That's really an example of a
grouped aggregate with time functions (Floor, Minute, Hour, etc). The
cardinality of the output of this query is that the number of rows equals the
number of groups. Whereas, with window functions the cardinality of the output
is that the number of rows equals the number of rows of input.
Let me simplify your trades example to daily stock prices with a schema of
(symbol TEXT, transDate DATE, closePrice DOUBLE). For each stock you'd like
the sliding 3-day average of the stock closing prices. You would do that with
the following SQL-99 syntax:
SELECT symbol, transDate, closePrice, Avg(closePrice) OVER (PARTITION BY symbol
ORDER BY closePrice ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS
threeDayAverage FROM stocks WHERE symbol = 'XYZ';
Here, each day will have a "window" of rows that stretches from 2 rows before
(if they exist) to the current row, and the value is the average of the three
closePrice values. Thus, there is an output for every row of input. For
Thursday's threeDayAverage for stock XYZ we will use the closePrice from
Tuesday, Wednesday, and Thursday. For Friday's threeDayAverage for stock XYZ
we will use the closePrice from Wednesday, Thursday, and Friday. And so on.
For what it's worth, this is not hard to do if there is a partition key
predicate. We are simply doing a pass over the rows to return to the client
and rolling things up. It is possible we need to sort the data depending on
the ORDER BY clause, but otherwise the aggregation is a simple rollup. It
should be noted that SQL allows for very flexible window specifications that
can cause trouble, such as
"OVER (PARTITION BY symbol ORDER BY transDate ROWS BETWEEN CURRENT ROW AND
UNBOUNDED FOLLOWING)"
which would go from the current row to the end of the partition. That can be a
tricky case. SQL99 also supports RANGE window specifications in addition to
ROW specifications. That can also be tricky.
That said, window functions would be a nice addition (especially with a
partition key predicate).
> CQL support for time series aggregation
> ---------------------------------------
>
> Key: CASSANDRA-9778
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9778
> Project: Cassandra
> Issue Type: New Feature
> Components: CQL
> Reporter: Jonathan Ellis
> Assignee: Benjamin Lerer
> Fix For: 3.x
>
>
> Along with MV (CASSANDRA-6477), time series aggregation or "rollups" are a
> common design pattern in cassandra applications. I'd like to add CQL support
> for this along these lines:
> {code}
> CREATE MATERIALIZED VIEW stocks_by_hour AS
> SELECT exchange, day, day_time(1h) AS hour, symbol, avg(price), sum(volume)
> FROM stocks
> GROUP BY exchange, day, symbol, hour
> PRIMARY KEY ((exchange, day), hour, symbol);
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)