[ 
https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066337#comment-15066337
 ] 

Benjamin Lerer commented on CASSANDRA-10707:
--------------------------------------------

The main difficulty of the ticket is the paging. Between the client and the 
coordinator nodes the page are returned based on the grouping but internally 
the data are paged by number of rows. 
For example, if a {{Group by}} query is used with a page size of 5000, the 
first page returned to the client must contains the aggregates for the first 
5000 groups or less (if there was less than 5000 groups). As these groups can 
be composed of a big number of rows, in order to avoid  OOM errors, the 
coordinator node need to request pages of data from the other nodes until it 
has enough groups. One of the problem being that it is only possible to be sure 
that a group is complete when the next group is reached or the data exhausted.

> Add support for Group By to Select statement
> --------------------------------------------
>
>                 Key: CASSANDRA-10707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support 
> {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the 
> clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP 
> BY partitionKey, clustering0, clustering1; 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to