[jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement

Benjamin Lerer (JIRA) Wed, 20 Jan 2016 01:39:05 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108289#comment-15108289
 ]


Benjamin Lerer commented on CASSANDRA-10707:
--------------------------------------------

Taking into account the fact that we allow today queries like: {{SELECT 
max((*)), min((*)), count((*)) FROM myTable;}}, I do not really think that we 
can provide support for {{GROUP BY}} queries without allowing to group by 
partition keys.
Some users might be interested by queries like: {{SELECT max((*)), min((*)) 
count((*)) FROM myTable GROUP BY partitionKey;}} or {{SELECT max((*)), min((*)) 
count((*)) FROM myTable WHERE partitionKey IN (1, 2, 3) GROUP BY partitionKey;}}

Now, it is clear that those queries are not recommended and that the timeouts 
will probably need to be adjusted. As for the current aggregates queries a 
warning will be logged to warn the users if the partition key is not restricted 
by an equality.

The problem is not really the work needed to compute the aggregates. It is just 
the fact that the data has to be retrieved from other nodes.

In the future, we might manage to push the aggregate computation to the 
replicas but we are not there yet. 

> Add support for Group By to Select statement
> --------------------------------------------
>
>                 Key: CASSANDRA-10707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support 
> {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the 
> clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP 
> BY partitionKey, clustering0, clustering1; 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement

Reply via email to