[ https://issues.apache.org/jira/browse/CASSANDRA-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107407#comment-15107407 ]
Brian Hess commented on CASSANDRA-9778: ---------------------------------------- [~blerer] - that really isn't an example of window functions (A/K/A window aggregates, window aggregate functions, etc). That's really an example of a grouped aggregate with time functions (Floor, Minute, Hour, etc). The cardinality of the output of this query is that the number of rows equals the number of groups. Whereas, with window functions the cardinality of the output is that the number of rows equals the number of rows of input. Let me simplify your trades example to daily stock prices with a schema of (symbol TEXT, transDate DATE, closePrice DOUBLE). For each stock you'd like the sliding 3-day average of the stock closing prices. You would do that with the following SQL-99 syntax: SELECT symbol, transDate, closePrice, Avg(closePrice) OVER (PARTITION BY symbol ORDER BY closePrice ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS threeDayAverage FROM stocks WHERE symbol = 'XYZ'; Here, each day will have a "window" of rows that stretches from 2 rows before (if they exist) to the current row, and the value is the average of the three closePrice values. Thus, there is an output for every row of input. For Thursday's threeDayAverage for stock XYZ we will use the closePrice from Tuesday, Wednesday, and Thursday. For Friday's threeDayAverage for stock XYZ we will use the closePrice from Wednesday, Thursday, and Friday. And so on. For what it's worth, this is not hard to do if there is a partition key predicate. We are simply doing a pass over the rows to return to the client and rolling things up. It is possible we need to sort the data depending on the ORDER BY clause, but otherwise the aggregation is a simple rollup. It should be noted that SQL allows for very flexible window specifications that can cause trouble, such as "OVER (PARTITION BY symbol ORDER BY transDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)" which would go from the current row to the end of the partition. That can be a tricky case. SQL99 also supports RANGE window specifications in addition to ROW specifications. That can also be tricky. That said, window functions would be a nice addition (especially with a partition key predicate). > CQL support for time series aggregation > --------------------------------------- > > Key: CASSANDRA-9778 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9778 > Project: Cassandra > Issue Type: New Feature > Components: CQL > Reporter: Jonathan Ellis > Assignee: Benjamin Lerer > Fix For: 3.x > > > Along with MV (CASSANDRA-6477), time series aggregation or "rollups" are a > common design pattern in cassandra applications. I'd like to add CQL support > for this along these lines: > {code} > CREATE MATERIALIZED VIEW stocks_by_hour AS > SELECT exchange, day, day_time(1h) AS hour, symbol, avg(price), sum(volume) > FROM stocks > GROUP BY exchange, day, symbol, hour > PRIMARY KEY ((exchange, day), hour, symbol); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)