[
https://issues.apache.org/jira/browse/CASSANDRA-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620758#comment-14620758
]
Ajay edited comment on CASSANDRA-9767 at 7/9/15 4:11 PM:
---------------------------------------------------------
I raised this bug majorily for allowing the selection of columns along with
count (*) (other aggregates as Cassandra supporting it from 2.2).Though it make
sense with GROUP BY, I didn't raise the bug to support GROUP BY as such issues
usually are shot down immediately saying use Spark for such cases. Now having
supported cross nodes aggregations in 2.2 (thanks. I was not knowing this
before), it make sense (and should not be much difficult) to support GROUP
BY/HAVING or similar in CQL as well.
was (Author: ajaygarga):
I raised this bug majorily for allowing the selection of columns along with
count (*) (other aggregates as Cassandra supporting it from 2,2).Though it make
sense with GROUP BY, I didn't raise the bug to GROUP BY as such issues usually
are shot down immediately saying use Spark for such cases. Now having supported
cross nodes aggregations in 2.2 (thanks. I was not knowing this before), it
make sense (and should not be much difficult) to support GROUP BY/HAVING or
similar in CQL as well.
> Allow the selection of columns together with aggregates
> -------------------------------------------------------
>
> Key: CASSANDRA-9767
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9767
> Project: Cassandra
> Issue Type: Wish
> Components: Core
> Environment: Cassandra 2.0.16
> Ubuntu 15.04
> Reporter: Ajay
> Assignee: Benjamin Lerer
> Priority: Minor
>
> Lets assume we have a column family as below:
> create table sample ( track_id int, user_id int, country varchar, primary key
> ((track_id), user_id));
> where track_id is the partition key.
> Now to aggregate the number of rows for a single track_id, we can query using
> CQL as below:
> select count(*) where track_id = 1 and user_id = 1;
> But that will return only the count. If we need the other columns along with
> the count, we cannot query as below as it throws error:
> select count(*), country from sample where track_id = 1 and user_id = 1;
> Bad Request: line 1:15 mismatched input ',' expecting K_FROM.
> In this case, all rows for a given track_id and user_id will have the same
> value for country. So we should be able to query as above. Also in SQL, it
> is possible to select columns along with aggregate functions.
> Though I know that Cassandra is not analytics (unlike Hadoop and Spark), we
> need some basic aggregate functions like min, max, avg etc....Though
> performance wise it might not be efficient, but it is better done in the
> cassandra side (as it uses native protocol) than we getting all rows in the
> client and doing the basic aggregation. It cannot used just as a data store
> (as garbage-in garbage-out). In that context, currently CQL is pretty
> limited. Just for getting data out of cassandra, we will have to spark though
> we will not be doing much analytics on it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)