[
https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167513#comment-15167513
]
Jon Haddad edited comment on CASSANDRA-10707 at 2/25/16 6:46 PM:
-----------------------------------------------------------------
I don't think changing the order of ORDER BY and GROUP BY is self explanatory,
so it doesn't really offer any benefit, imo. If I was trying out the feature
I'd mostly be annoyed by it's difference from something I've got muscle memory
for.
If you wanted to be technically accurate about it, SQL is declarative. The
order in which you specify the predicates, for instance, doesn't matter, it
just happens to line up with how we mentally process it. If you chance the
order of predicates in your WHERE clause it doesn't matter, you'll still end up
with the same query result.
Assuming I'm understanding the implementation correctly, what you're saying is
that the query behaves more the following:
{code}
select * from
( select * from table order by some_field limit 100)
group by x,y,z
{code}
Is this correct, or am I missing something? If it's the case, I hope this
doesn't box us in later down the line if we want to add support for other
operations (like sub queries). If we're going to introduce more
inconsistencies with SQL (which may be totally fair, I'm just thinking out loud
here), we would want to put the GROUP BY after the LIMIT, since it's being
applied then. I'm not sure what this does to CQL in general, as now we've
implicitly made the decision to introduce clauses in an imperative fashion.
I'd rather not see new clauses added piece by piece with different rules
depending on the context, that definitely won't make things any easier.
So my question is, is CQL a declarative language or not? Will this ever be
something we intend to allow:
{code}
select username, score, state count(state) as c from top_scores where game_id=5
limit 1000 group by state order by c desc limit 5;
{code}
I don't think the above query works at all. The aggregation is clearly a
declarative clause.
Now, if the behavior of limit before aggregation is the right decision, that I
might have to argue with.
was (Author: rustyrazorblade):
I don't think changing the order of ORDER BY and GROUP BY is self explanatory,
so it doesn't really offer any benefit, imo. If I was trying out the feature
I'd mostly be annoyed by it's difference from something I've got muscle memory
for.
If you wanted to be technically accurate about it, SQL is declarative. The
order in which you specify the clauses doesn't matter, it just happens to line
up with how we mentally process it. If you chance the order of predicates in
your WHERE clause it doesn't matter, you'll still end up with the same query
result.
Assuming I'm understanding the implementation correctly, what you're saying is
that the query behaves more the following:
{code}
select * from
( select * from table order by some_field limit 100)
group by x,y,z
{code}
Is this correct, or am I missing something? If it's the case, I hope this
doesn't box us in later down the line if we want to add support for other
operations (like sub queries). If we're going to introduce more
inconsistencies with SQL (which may be totally fair, I'm just thinking out loud
here), we would want to put the GROUP BY after the LIMIT, since it's being
applied then. I'm not sure what this does to CQL in general, as now we've
implicitly made the decision to introduce clauses in an imperative fashion.
I'd rather not see new clauses added piece by piece with different rules
depending on the context, that definitely won't make things any easier.
So my question is, is CQL a declarative language or not? Will this ever be
something we intend to allow:
{code}
select username, score, state count(state) as c from top_scores where game_id=5
limit 1000 group by state order by c desc limit 5;
{code}
I don't think the above query works at all. The aggregation is clearly a
declarative clause.
Now, if the behavior of limit before aggregation is the right decision, that I
might have to argue with.
> Add support for Group By to Select statement
> --------------------------------------------
>
> Key: CASSANDRA-10707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
> Project: Cassandra
> Issue Type: Improvement
> Components: CQL
> Reporter: Benjamin Lerer
> Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support
> {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the
> clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP
> BY partitionKey, clustering0, clustering1;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)