[
https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173551#comment-15173551
]
Sylvain Lebresne commented on CASSANDRA-10707:
----------------------------------------------
bq. The first point that needs discussing is that this patch changes the
inter-node protocol and this without bumping the protocol version
We had some offline discussion about this, and the consensus seems to be that
we'll leave it as is for this patch with just a fat warning in the NEWS file
that you shouldn't use {{GROUP BY}} until you've fully upgraded. As said above,
this is not perfect if someone don't follow that arguably intuitive instruction
but this'll do for this time. In the meantime, we'll fix the protocol
deserialization so it doesn't drop the connection if a message has more than it
expects, but just skips the message remainder. Longer term, we should introduce
at least major and minor versioning for the messaging protocol so we can deal
with this in a better way.
bq. the operation (filtering and ordering) commute. That's not really the case
for {{ORDER BY}} and {{GROUP BY}}.
Actually, I guess the grouping itself commutes, it's more the aggregation that
depend on the order. So nevermind, I'm good with sticking to the SQL syntax.
bq. Having a GroupMaker implementation for normal queries simplify the code as
the same algorythm can be used for the 3 scenarios.
I read the code too quickly, sorry, but still, I meant to go the same route
than for {{GroupSpecification.NO_GROUPING}}. The naming is equally confusing
imo and the code simplification is pretty detatable: we reference that
{{GroupMaker}} in {{Selection}} (only place where {{NO_GROUPING}} can be used I
believe) twice, so using some {{groupMaker != null}} won't make a big
difference.
bq. Even if we allow functions the {{lastClustering}} will always be a the last
clustering.
Fair enough, though I still feel like grouping it with the partition key in a
{{GroupMaker.State}} would be a tad cleaner. And at the very least, why use a
{{ByteBuffer[]}} for the clustering instead of {{Clustering}} which is a more
explicit?
bq. I tried the approach that you suggested but without success
I'll try to have a look in the coming days because I do feel it would be
cleaner and it ought to be possible but ...
bq. Performing the modification outside of the {{CQLGroupByLimits}} is probably
possible but will force us to modify the {{DataLimits}} and {{QueryPager}}
interfaces to expose the {{rowCount}}.
... I might be misunderstanding what this imply but that doesn't sound
particularly bad to me.
A few other remarks:
* In the news file, you have {{IN restrictions with only one element are now
considered as equality restrictions}}. What does that mean for the user?
* Could remove {{CFMetadaData.primaryKeyColumns()}} now that it's unused.
* The comment in {{DataLimits.CQLLimits.hasEnoughLiveData}} still misses some
part, it used to (and should) read {{Getting that precise _number forces_ us
...}}.
* Forgot that initially, but in {{DataLimits.CQLGroupByLimits.forPaging(int,
ByteBuffer, int)}}, it's fishy to me that we use the partition key in
parameters but reuse the pre-existing {{lastClustering}}. If we're guaranteed
than {{lastReturnedKey == lastPartitionKey}} then we should assert it as it's
not immediatly obvious, otherwise this is wrong.
> Add support for Group By to Select statement
> --------------------------------------------
>
> Key: CASSANDRA-10707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
> Project: Cassandra
> Issue Type: Improvement
> Components: CQL
> Reporter: Benjamin Lerer
> Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support
> {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the
> clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP
> BY partitionKey, clustering0, clustering1;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)