[jira] [Comment Edited] (CASSANDRA-10707) Add support for Group By to Select statement

Benjamin Lerer (JIRA) Fri, 01 Jan 2016 13:37:07 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076378#comment-15076378
 ]


Benjamin Lerer edited comment on CASSANDRA-10707 at 1/1/16 9:36 PM:
--------------------------------------------------------------------

Both will be supported.
What will not be supported is a {{group by}} clause where only a part of the 
partition key will be specified. For example, if a table has a primary key like 
{{PRIMARY KEY((partitionKey1, partitionKey2) clustering1, clustering2)}}, the 
following query will not be supported:
{{SELECT partitionKey1, MAX(value) FROM myTable GROUP BY partitionKey1}}

As for the aggregates, the grouping will be performed on the coordinator node. 
By consequence, if the driver use the Token aware policy, a query containing a 
partition key predicate will be more efficient as the aggregates will be built 
on the node where the data are located.

>From the syntax point of view, the queries:
{{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE 
partitionKey=5 GROUP BY partitionKey, clusteringColumn1;}}
and  {{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE 
partitionKey=5 GROUP BY clusteringColumn1;}} will be both supported due to the 
fact that the {{partitionKey}} column is restricted by an {{=}} operator.


was (Author: blerer):
Both will be supported.
What will not be supported is a {{group by}} clause were only a part of the 
partition key will be specified. For example, if a table has a primary key like 
{{PRIMARY KEY((partitionKey1, partitionKey2) clustering1, clustering2)}}, the 
following query will not be supported:
{{SELECT partitionKey1, MAX(value) FROM myTable GROUP BY partitionKey1}}

As for the aggregates, the grouping will be performed on the coordinator node. 
By consequence, if the driver use the Token aware policy, a query containing a 
partition key predicate will be more efficient as the aggregates will be built 
on the node where the data are located.

>From the syntax point of view, the queries:
{{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE 
partitionKey=5 GROUP BY partitionKey, clusteringColumn1;}}
and  {{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE 
partitionKey=5 GROUP BY clusteringColumn1;}} will be both supported due to the 
fact that the {{partitionKey}} column is restricted by an {{=}} operator.

> Add support for Group By to Select statement
> --------------------------------------------
>
>                 Key: CASSANDRA-10707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support 
> {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the 
> clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP 
> BY partitionKey, clustering0, clustering1; 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10707) Add support for Group By to Select statement

Reply via email to