[jira] [Commented] (CASSANDRA-17811) CQL aggregation functions on collections, tuples and UDTs

Jira Thu, 20 Oct 2022 12:27:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621329#comment-17621329
 ]


Andres de la Peña commented on CASSANDRA-17811:
-----------------------------------------------

Actually, we can go a bit further with the refactor.

All the callers of {{{}Schema#findFunction{}}}, {{Schema#getFunctions}} and 
{{KeyspaceMetadata.functions}} are only interested on user functions. Indeed, 
the statements for creating, deleting, changing permissions or describing a 
function are about user functions, never native functions. The only exception 
to this {{{}AbstractFunctionSelectorDeserializer{}}}, which can perfectly use 
{{FunctionResolver.get}} instead.

I think we can entirely remove the native functions from the schema package, so 
{{Schema}} and {{KeyspaceMetadata}} only know about the user functions that 
they store. The statements that manipulate these user functions would keep 
directly interacting with the schema. Conversely, the native functions can be 
kept into the {{NativeFunctions}} dedicated singleton, inside the cql package. 
The one and only point of access to get both native and user functions will 
then be {{{}FunctionResolver{}}}.

I think this approach makes it very clear where the functions are stored and 
what kind of function each class/method uses. Also, it reduces coupling between 
the {{schema}} and {{functions}} packages by making native functions a CQL 
thing and user functions a schema thing, both accessible by the 
{{FunctionResolver}} common entry point.

> CQL aggregation functions on collections, tuples and UDTs
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-17811
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17811
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL/Semantics
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>
> It has been found during CASSANDRA-8877 that CQLS's aggregation functions 
> {{{}max{}}}, {{min}} and {{count}} can be applied to collections, but the 
> result is returned as a blob. For example:
> {code:java}
> CREATE TABLE t (k int PRIMARY KEY, l list<int>);
> INSERT INTO t(k, l) VALUES (0, [1, 2, 3]);
> INSERT INTO t(k, l) VALUES (1, [10, 20, 30]);
> SELECT max(l) FROM t;
>  system.max(l)
> ------------------------------------------------------------
>  0x00000003000000040000000a0000000400000014000000040000001e
> {code}
> This happens on 3.0, 3.11, 4.0, 4.1 and trunk.
> I'm not sure on whether the function shouldn't be supported for collections, 
> or it should be supported but the result is wrong.
> In the example above, the returned blob is the serialized value of {{{}[10, 
> 20, 30]{}}}, which is the right one according to the list comparator. I think 
> this happens because the matched version of the function is the one for 
> {{{}(blob) -> blob{}}}. We would need a {{(list<int>) -> list<int>}} function 
> instead, but this function doesn't exist.
> It would be quite easy to add versions of the {{{}max{}}}, {{min}} and 
> {{count}} functions for every type of collection ({{{}list<int>{}}}, 
> {{{}list<text>{}}}, {{{}map<int, int>{}}}, {{{}map<int, text>{}}}, etc.). The 
> downside of this approach is that it would increase the number of aggregation 
> functions kept in memory from 82 to 2722, if my maths are right. This is 
> quite an increase, mainly due to the many possible combinations of the 
> {{map}} type. 
> [Here|https://github.com/adelapena/cassandra/commit/e3ba3c2dc36ce58d06942078c708ffb93eb3cd84]
>  is a quick, incomplete prototype of the approach.
> Also, I'm not sure that applying those aggregation functions to collections 
> is very useful in practice. Thus, an alternative approach would be just 
> forbidding them, considering them not supported. I don't think it would be a 
> problem for backward compatibility since no one has complained about the 
> current behaviour, and we might well consider that the original intent was 
> not to allow aggregation on collections. At least, there aren't any tests for 
> it, and I can't find any documentation about it either.
> Another idea that comes to mind is that we could change the meaning of those 
> functions to aggregate the values within the collection, instead of 
> aggregating the rows. In that case, the behaviour would be:
> {code:java}
> CREATE TABLE t (k int PRIMARY KEY, l list<int>);
> INSERT INTO t(k, l) VALUES (0, [1, 2, 3]);
> INSERT INTO t(k, l) VALUES (1, [10, 20, 30]);
> SELECT max(l) FROM t;
>  k | system.max(l)
> ---+-----------
>  1 | 30
>  0 | 3
> {code}
> Of course we could have separate function names for that type of collection 
> aggregations, like {{{}collectionMax{}}}, {{{}maxItem{}}}, or something like 
> that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-17811) CQL aggregation functions on collections, tuples and UDTs

Reply via email to