[
https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163608#comment-15163608
]
Sylvain Lebresne commented on CASSANDRA-10707:
----------------------------------------------
The first point that needs discussing is that this patch changes the inter-node
protocol and this without bumping the protocol version, which is something
we've never done.
On the one side bumping the protocol version is something we've kept for major
releases and so far planned not to do before 4.0. But on the other side, not
being able to add anything to the inter-node messages limits a lot new features
(like this one) and doesn't make sense in a tick-tock world.
We could accept this kind of change without bumping the version on the argument
that it's only "additive" to the serialization format and that it breaks
nothing if the new feature is not used, and we'd document that "you shall not
use GROUP BY until all nodes in the cluster are updated to a version that
supports it". But it's not something to treat lightly because if a user don't
respect that rule, older node will fail deserialization of the message, which
will drop the connection, which can impact other on-ongoing queries, which is
not good.
Alternatively, we could simply bump the protocol version, forgetting the "not
before 4.0" rule. The advantage is that if we do so we can have the sending
node (instead of the receiving one) throw if a GROUP BY is used but not all
nodes are up to date, which is cleaner and avoid the connection dropped
problem.
So something we need to decide first, and maybe we need a separate ticket to
have that discussion.
Anyway, outside of this question, some general remarks on the patch:
* In the parser, we have the {{GROUP BY}} before the {{ORDER BY}}. On the one
side it's more consistent with {{SQL}} but on the other side, we do happen to
order before we group. And we kind of know that post-query ordering is not
really something we can do due to paging (we do happen to do it in a few case
when paging is not involved out of backward compatibility but given the
limitation, I don't think we'll want to ever extend that) so it does feel the
{{ORDER BY}} should come before {{GROUP BY}}.
* The {{forPagingByQueryPager}} and {{forGroupByInternalPaging}} methods are
not very intuitive: it's unclear when just looking at {{DataLimits}} why they
exist and why you should use them or not. I'm also not entirely sure we need
the difference between the two. If I understand correctly, the problem is that
when a replica ends a page in the middle of a group, the coordinator don't know
if said replica has stopped because it reaches the end of a group which was the
last one to return (in which case we're done), or if the replica stopped
because of the page limit (but the group itself is not finished). But the
coordinator should be able to figure that out by checking it's row counts: if
after having consumed the page from the replica we've reached the group limit
_or_ there was less row in the page than the page limit, then we're done,
otherwise we should ask for more. Granted, that does mean in the rare case
where we've exhausted all data exactly on the page limit, we'll do an
additional query (that will be empty), but pretty sure you have that problem in
one form or another whatever you do. Am I missing something here?
* Not sure how I feel about {{GroupSelector}}. It's a bit confusing as of this
patch and while I get that its for allowing functions later, it might not be as
generic as we may need. For instance, if we ever want to support function on
multiple columns (which would be possible, assuming those are consecutive PK
columns), this won't work, at least not as implemented. I think I'd slightly
prefer removing {{GroupSelector}} for now and just rely on extending
{{GroupBySpecification}} for future extension (typically we'll add a
{{withFunctionGroupBySpec}} which would have all liberty as to how it is
implemented). Doing so also mean we can simplify the probably common and only
currently supported case as we just need to keep the length of the prefix we
group by rather than a list of (dumb) selector.
* I feel keeping/passing the {{lastPartitionKey}} and {{lastClustering}} in
{{DataLimits.CQLGroupByLimits}} and {{GroupMaker}} is not as future
proof/generic as the patch is otherwise trying to be. Typically, if we allow
functions, the {{lastClustering}} won't really be a clustering. Granted, it'll
still fit into a {{ByteBuffer[]}} but it'll be a bit of an abuse. So I would
suggest adding {{GroupMaker.State}} concept whose implementation would be
specific to the {{GroupMaker/GroupBySpecification}} but would be transparent to
other code. Imo that would make some code cleaner (for instance,
{{GroupMaker.GROUP_EVERYTHING}} wouldn't need to store anything, which makes
sense).
* {{GroupBySpeccification.NO_GROUPING}} feels confusing because it really means
the query is not an aggregate, while the naming makes it sound like it has no
{{GROUP BY}}. Similarly, the {{isGrouping}} method really mean
{{hasAggregate}}. In fact, I think it would be more clear to just have no
{{GroupBySpecification}} in that case, keeping it {{null}} in SelectStatement
(and in {{Selection}}). Having {{groupBySpec != null}} to mean no aggregate is
imo more clear than {{!groupBySpec.isGrouping()}}. I'm also confused as to why
{{GroupMaker.NO_GROUPING}} even exists, we shouldn't ever create a
{{GroupMaker}} in the first place if we're not aggregating at all.
* In {{DataLimits}}, the naming of {{noGroupByLimits}} (and
{{distinctNoGroupByLimits}}) is confusing (I mean, the both create a
{{CQLGroupByLimits}}!). I think we could remove then and just pass
{{cqlRowLimit}} directly as first argument in {{SelectStatement.getDataLimits}}.
* Related to the previous point, some phrasing is a tad confusing, like a
comment saying {{GroupByPartitionIterator for queries without Group By}} in
{{GroupByQueryPager}}. I think that's because {{GroupSpecification}},
{{GroupByLimits}} and {{GroupByQueryPager}} classes are really about handling
aggregates in general, not query having an explicit {{GROUP BY}}. So maybe
renaming to {{AggregationSpecification}}, {{AggregationLimits}} and
{{AggregationQueryPager}} would be more clear?
* The logic in {{SelectStatement.RawStatement.getGroupBySpecification}} breaks
(doesn't throw a "proper" exception) if someone write {{SELECT * FROM t GROUP
BY a, a}} where {{a}} is the sole PK column (the query is stupid, but it should
throw a proper exception). More generally, I feel that the logic don't make it
too intuitive that we're indeed checking proper ordering of the column. As a
nit, instead of copying both partition and clustering columns in
{{CFMetaData.primaryKeyColumns}}, I'd use {{Iterables.concat}} instead (and use
an iterator in getGroupBySpecification, which would make it even more apparent
than not checking {{hasNext}} is a bad idea).
* Having {{CQLGroupByLimits}} inherit {{estimateTotalResults}} from
{{CQLLimits}} is theoretically wrong as we should return an estimate of groups.
Now, properly doing that is hard and so for now I'm fine reusing the one of
{{CQLLimits}} for simplicity but we should call this out clearly by overriding
the method (having it just call {{super.estimateTotalResults()}}) with a
comment explaining we know it's off and should ideally be fixed someday. In
fact, I wonder if having {{CQLGroupByLimits}} inherit {{CQLLimits}} is a good
idea: they work sufficiently differently that it feels risky to have method
inherited silently. The only other method not overriden is
{{hasEnoughLiveData}} and we could easily use a {{private static}} helper for
code reuse in that case. Anyway, I don't mind the inheritence terribly if you
prefer but figured I'd mention it.
* Not sure about {{CQLGroupByLimits.forShortReadRetry()}}. I believe putting no
limit on the number of rows (and only on the group) might lead to OOM. In
fact, I need to think more carefully about this but I'm not 100% sure that the
short read logic isn't throw off by the fact that {{counted()}} returns a
number of groups not rows.
* In {{GroupByQueryPager}}, both iterator implementation have a
{{correctPageSize()}} method that is unused. And in the case of
{{AggregationPartitionIterator}}, it seems fishy that it's not used.
* Shouldn't {{GroupByQueryPager.GoupByPartitionIterator.close()}} set
{{closed}} if it isn't set?
* I think we lack some validation/handling of composite partition keys. If I
have {{PRIMARY KEY ((a, b), c)}} and I do {{SELECT count( *) FROM t GROUP BY
a}}, I believe this silently do the grouping by both {{a}} and {{b}}. We should
reject this since we have no way to handle it.
Also, a bunch of minor nits:
* {{GroupByQueryPager.GroupByPartitionIterator.GroupByRowIterator}} could call
{{rowIterator.partitionKey()}} rather than stores the partition key separately.
Regarding that class, we also should remove all the {{@Override}} according to
the [code style|https://wiki.apache.org/cassandra/CodeStyle].
* Why do we have {{rowPageSize}} in {{CQLGroupByLimits}}. The ctor seems to
always make it equal to {{rowLimit}} and both fields are {{final}} (so will
stay equal).
* In {{Selection.resultSetBuilder}}, the {{isJons}} parameter is a typo for
{{isJson}}. I know this is not new to this patch but lets fix it rather than
duplicate the typo.
* Your IDE is not respecting the import order of the code-style. It moved
{{com.google}} imports after the {{org.apache}} ones in {{Selection.java}} for
instance (see the [code style|https://wiki.apache.org/cassandra/CodeStyle]). I
also don't love that it expands {{*}} as this adds noise to patches but that
might be more personal.
* We can now remove the {{// Note that if there are some nodes in the cluster
with a version less than 2.0, we can't use paging (CASSANDRA-6707)}} line (in
{{SelectStatement.getDataLimits}}).
* Could add a comment in
{{SelectStatement.RawStatement.getGroupBySpecification()}} as to why we don't
do anything with partition key columns.
* Was there a reason to change {{s/cqlRowLimit/userLimit}} in the last 2 lines
of {{SelectStatement.getDataLimits()}}? Pretty sure it doesn't matter but since
I don't find the change particularly better I'm left wondering if I'm missing
the reason for the change.
* Could be wrong, but it doesn't seem we need to add {{withUpdatedLimit()}} to
{{ReadQuery}} (that is, we only use it on a {{ReadCommand}}). If true, I'd
rather remove it from {{ReadQuery}} which allow to remove the unused
implementation in {{SinglePartitionReadCommand.Group}}.
* Indentation for the parameters of {{GroupMaker.newInstance}} is off.
* It would make more sense for {{QueryPager.EMPTY.withUpdatedLimit()}} to throw
{{UnsupportedOperationException}}.
* In {{DataLimits.CQLLimits.hasEnoughLiveData}}, a comment has been butchered.
* Some indentation is off in {{DataLimits.Serializer.deserialize}}.
* In {{GroupByQueryPager.handlePagingOff}}, the comment is incomplete.
* Unecessary import of {{Optional}} in {{Restrictions.java}}.
> Add support for Group By to Select statement
> --------------------------------------------
>
> Key: CASSANDRA-10707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
> Project: Cassandra
> Issue Type: Improvement
> Components: CQL
> Reporter: Benjamin Lerer
> Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support
> {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the
> clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP
> BY partitionKey, clustering0, clustering1;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)