[
https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855523#comment-13855523
]
Pablo Chacin edited comment on CASSANDRA-4914 at 12/23/13 9:39 AM:
-------------------------------------------------------------------
{quote} the aggregate function would be iteratively called on each grouping. Is
that accurate?
Exactly. Basically I'm suggesting that the grouping itself can be done
separately by the Selection object since it's completely generic, it doesn't
depend on which aggregate function you'll execute.
{quote}
[~slebresne] I completely agree with this approach. Actually, I used this
pattern, which is basically a visitor pattern, some time ago when implementing
a (small scale) time series plotting framework and it works nicely. You have a
lot of freedom on how you traverse data (e.g. grouping) with a generic set of
functions.
In the case of Cassandra, however, one mayor concern would be that for
partitionable functions like sum, count, min or max, each node could do its
part of aggregation. For non-partitionable function like average or
percentiles, all the aggregation must be done at the coordinator.
was (Author: pablochacin):
{quote} the aggregate function would be iteratively called on each grouping. Is
that accurate?
Exactly. Basically I'm suggesting that the grouping itself can be done
separately by the Selection object since it's completely generic, it doesn't
depend on which aggregate function you'll execute.
{quote}
[~Sylvain Lebresne] I completely agree with this approach. Actually, I used
this pattern, which is basically a visitor pattern, some time ago when
implementing a (small scale) time series plotting framework and it works
nicely. You have a lot of freedom on how you traverse data (e.g. grouping) with
a generic set of functions.
In the case of Cassandra, however, one mayor concern would be that for
partitionable functions like sum, count, min or max, each node could do its
part of aggregation. For non-partitionable function like average or
percentiles, all the aggregation must be done at the coordinator.
> Aggregate functions in CQL
> --------------------------
>
> Key: CASSANDRA-4914
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4914
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Vijay
> Assignee: Vijay
> Fix For: 2.1
>
>
> The requirement is to do aggregation of data in Cassandra (Wide row of column
> values of int, double, float etc).
> With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for
> the columns within a row).
> Example:
> SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC;
>
> empid | deptid | first_name | last_name | salary
> -------+--------+------------+-----------+--------
> 130 | 3 | joe | doe | 10.1
> 130 | 2 | joe | doe | 100
> 130 | 1 | joe | doe | 1e+03
>
> SELECT sum(salary), empid FROM emp WHERE empID IN (130);
>
> sum(salary) | empid
> -------------+--------
> 1110.1 | 130
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)