[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL

Tyler Hobbs (JIRA) Tue, 17 Feb 2015 08:34:58 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324408#comment-14324408
 ]


Tyler Hobbs commented on CASSANDRA-4914:
----------------------------------------

bq. I don't know the internals but it should be doable to push the aggregation 
function to the partitions without requiring the data interface to understand 
CQL.

The problem with pushing aggregate calculation down to the replicas is that 
there's no conflict resolution.  So the aggregation can be computed over stale 
or deleted data.  That may be acceptable if you're reading at consistency level 
ONE, but then we're dealing with a limited, special case.

bq. Note that all agg functions are eminently parallelizible

I don't believe this is true.  Off the top of my head, computing the median of 
a dataset is not really parallelizable (without some sort of internode 
communication).

bq. dealing with consistency is tricky but then Cassandra is by design 
eventually consistent so why not have eventually consistent aggregations. Just 
pick a partition and aggregate on that. With large datasets an average 
differing at the sixth decimal won't really matter.

That may be acceptable for aggregates like average, but other aggregates may 
require precision.

With all of that said, I wouldn't necessarily be opposed to supporting 
selecting a sampling of data from a table (and allowing an aggregate to be run 
over that), but I suggest opening a new ticket for that discussion.

> Aggregation functions in CQL
> ----------------------------
>
>                 Key: CASSANDRA-4914
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Vijay
>            Assignee: Benjamin Lerer
>              Labels: cql, docs
>             Fix For: 3.0
>
>         Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, 
> CASSANDRA-4914-V4.txt, CASSANDRA-4914-V5.txt, CASSANDRA-4914.txt
>
>
> The requirement is to do aggregation of data in Cassandra (Wide row of column 
> values of int, double, float etc).
> With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for 
> the columns within a row).
> Example:
> SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC;                  
>                   
>  empid | deptid | first_name | last_name | salary
> -------+--------+------------+-----------+--------
>    130 |      3 |     joe    |     doe   |   10.1
>    130 |      2 |     joe    |     doe   |    100
>    130 |      1 |     joe    |     doe   |  1e+03
>  
> SELECT sum(salary), empid FROM emp WHERE empID IN (130);                      
>               
>  sum(salary) | empid
> -------------+--------
>    1110.1    |  130



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL

Reply via email to