[jira] [Comment Edited] (CASSANDRA-4914) Aggregate functions in CQL

Pablo Chacin (JIRA) Mon, 23 Dec 2013 01:41:40 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855523#comment-13855523
 ]


Pablo Chacin edited comment on CASSANDRA-4914 at 12/23/13 9:39 AM:
-------------------------------------------------------------------

{quote} the aggregate function would be iteratively called on each grouping. Is 
that accurate?
Exactly. Basically I'm suggesting that the grouping itself can be done 
separately by the Selection object since it's completely generic, it doesn't 
depend on which aggregate function you'll execute.
{quote}

[~slebresne] I completely agree with this approach. Actually, I used this 
pattern, which is basically a visitor pattern, some time ago when implementing 
a (small scale) time series plotting framework and it works nicely. You have a 
lot of freedom on how you traverse data (e.g. grouping) with a generic set of 
functions.  

In the case of Cassandra, however, one mayor concern would be that for 
partitionable functions like sum, count, min or max, each node could do its 
part of aggregation. For non-partitionable function  like average or 
percentiles, all the aggregation must be done at the coordinator. 


was (Author: pablochacin):

{quote} the aggregate function would be iteratively called on each grouping. Is 
that accurate?
Exactly. Basically I'm suggesting that the grouping itself can be done 
separately by the Selection object since it's completely generic, it doesn't 
depend on which aggregate function you'll execute.
{quote}

[~Sylvain Lebresne] I completely agree with this approach. Actually, I used 
this pattern, which is basically a visitor pattern, some time ago when 
implementing a (small scale) time series plotting framework and it works 
nicely. You have a lot of freedom on how you traverse data (e.g. grouping) with 
a generic set of functions.  

In the case of Cassandra, however, one mayor concern would be that for 
partitionable functions like sum, count, min or max, each node could do its 
part of aggregation. For non-partitionable function  like average or 
percentiles, all the aggregation must be done at the coordinator. 

> Aggregate functions in CQL
> --------------------------
>
>                 Key: CASSANDRA-4914
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Vijay
>            Assignee: Vijay
>             Fix For: 2.1
>
>
> The requirement is to do aggregation of data in Cassandra (Wide row of column 
> values of int, double, float etc).
> With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for 
> the columns within a row).
> Example:
> SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC;                  
>                   
>  empid | deptid | first_name | last_name | salary
> -------+--------+------------+-----------+--------
>    130 |      3 |     joe    |     doe   |   10.1
>    130 |      2 |     joe    |     doe   |    100
>    130 |      1 |     joe    |     doe   |  1e+03
>  
> SELECT sum(salary), empid FROM emp WHERE empID IN (130);                      
>               
>  sum(salary) | empid
> -------------+--------
>    1110.1    |  130



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (CASSANDRA-4914) Aggregate functions in CQL

Reply via email to