[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL

Tyler Hobbs (JIRA) Fri, 03 Oct 2014 14:51:07 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158569#comment-14158569
 ]


Tyler Hobbs commented on CASSANDRA-4914:
----------------------------------------

I'm thinking a bit about making this compatible with with UDFs.  The problem 
with this approach is that it relies on state that's not visible to the 
aggregation functions.

An alternative that would be (more easily) compatible with UDFs is a 
reduce-style aggregation.  The reducer function takes two inputs: the current 
state and the next value.  You can optionally provide an initial state and a 
finalizer function that is called with the final state after reducing. UDTs, 
tuples, and collections should be sufficiently powerful to represent anything 
that's needed for state.

In fact, Postgres's approach to user-defined aggregation functions is almost 
exactly this: 
http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html.  I think we 
could slightly simplify their approach by inferring the data types.

> Aggregation functions in CQL
> ----------------------------
>
>                 Key: CASSANDRA-4914
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Vijay
>            Assignee: Benjamin Lerer
>              Labels: cql, docs
>             Fix For: 3.0
>
>         Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, 
> CASSANDRA-4914-V4.txt, CASSANDRA-4914.txt
>
>
> The requirement is to do aggregation of data in Cassandra (Wide row of column 
> values of int, double, float etc).
> With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for 
> the columns within a row).
> Example:
> SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC;                  
>                   
>  empid | deptid | first_name | last_name | salary
> -------+--------+------------+-----------+--------
>    130 |      3 |     joe    |     doe   |   10.1
>    130 |      2 |     joe    |     doe   |    100
>    130 |      1 |     joe    |     doe   |  1e+03
>  
> SELECT sum(salary), empid FROM emp WHERE empID IN (130);                      
>               
>  sum(salary) | empid
> -------------+--------
>    1110.1    |  130



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL

Reply via email to