[jira] [Commented] (CASSANDRA-9778) CQL support for time series aggregation

Brian Hess (JIRA) Tue, 19 Jan 2016 13:01:35 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107407#comment-15107407
 ]


 Brian Hess commented on CASSANDRA-9778:
----------------------------------------

[~blerer] - that really isn't an example of window functions (A/K/A window 
aggregates, window aggregate functions, etc).  That's really an example of a 
grouped aggregate with time functions (Floor, Minute, Hour, etc).  The 
cardinality of the output of this query is that the number of rows equals the 
number of groups.  Whereas, with window functions the cardinality of the output 
is that the number of rows equals the number of rows of input.

Let me simplify your trades example to daily stock prices with a schema of 
(symbol TEXT, transDate DATE, closePrice DOUBLE).  For each stock you'd like 
the sliding 3-day average of the stock closing prices.  You would do that with 
the following SQL-99 syntax:
SELECT symbol, transDate, closePrice, Avg(closePrice) OVER (PARTITION BY symbol 
ORDER BY closePrice ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS 
threeDayAverage FROM stocks WHERE symbol = 'XYZ';

Here, each day will have a "window" of rows that stretches from 2 rows before 
(if they exist) to the current row, and the value is the average of the three 
closePrice values.  Thus, there is an output for every row of input.  For 
Thursday's threeDayAverage for stock XYZ we will use the closePrice from 
Tuesday, Wednesday, and Thursday.  For Friday's threeDayAverage for stock XYZ 
we will use the closePrice from Wednesday, Thursday, and Friday.  And so on.

For what it's worth, this is not hard to do if there is a partition key 
predicate.  We are simply doing a pass over the rows to return to the client 
and rolling things up.  It is possible we need to sort the data depending on 
the ORDER BY clause, but otherwise the aggregation is a simple rollup.  It 
should be noted that SQL allows for very flexible window specifications that 
can cause trouble, such as
"OVER (PARTITION BY symbol ORDER BY transDate ROWS BETWEEN CURRENT ROW AND 
UNBOUNDED FOLLOWING)"
which would go from the current row to the end of the partition.  That can be a 
tricky case.  SQL99 also supports RANGE window specifications in addition to 
ROW specifications.  That can also be tricky.

That said, window functions would be a nice addition (especially with a 
partition key predicate).


> CQL support for time series aggregation
> ---------------------------------------
>
>                 Key: CASSANDRA-9778
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9778
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL
>            Reporter: Jonathan Ellis
>            Assignee: Benjamin Lerer
>             Fix For: 3.x
>
>
> Along with MV (CASSANDRA-6477), time series aggregation or "rollups" are a 
> common design pattern in cassandra applications.  I'd like to add CQL support 
> for this along these lines:
> {code}
> CREATE MATERIALIZED VIEW stocks_by_hour AS
> SELECT exchange, day, day_time(1h) AS hour, symbol, avg(price), sum(volume)
> FROM stocks
> GROUP BY exchange, day, symbol, hour
> PRIMARY KEY  ((exchange, day), hour, symbol);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9778) CQL support for time series aggregation

Reply via email to