[jira] [Commented] (CASSANDRA-17811) CQL aggregation functions on collections, tuples and UDTs

Jira Thu, 20 Oct 2022 05:11:03 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621056#comment-17621056
 ]


Andres de la Peña commented on CASSANDRA-17811:
-----------------------------------------------

I have done some refactoring trying to follow [~blerer] suggestions:

# There is a new {{UserFunction}} abstract class, extended by UDTs and UDFs. So 
all functions are either {{NativeFunction}} or {{UserFunction}}. The only 
purpose of this class is making it clear what kind of functions a class holds 
or what kind of functions a class method returns.
# There is a new {{NativeFunctions}} class that holds all the existing native 
functions and native function factories. Its only instance is stored on 
{{SystemKeyspace}}.
# The class {{KeyspaceMetadata.Functions}} is renamed to {{UserFunctions}}. It 
doesn't store native functions anymore, but only user functions.
# The method {{Schema#findFunction}} looks into both {{UserFunctions}} and 
{{NativeFunctions}} to find the function with specified signature, acting as a 
single access point for retrieving functions by their exact signature.
# The method {{FunctionResolver.get}} looks into both {{UserFunctions}} and 
{{NativeFunctions}} to find the best function compatible with the specified 
function call, which might involve trying to infer the types of parameters from 
the call.
# I have added a factory for the {{token}} function, registered in 
{{NativeFunctions}} as any other native function factory. It was special-cased 
in {{FunctionResolver}} and not included into the registered {{Functions}}.

> CQL aggregation functions on collections, tuples and UDTs
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-17811
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17811
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL/Semantics
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>
> It has been found during CASSANDRA-8877 that CQLS's aggregation functions 
> {{{}max{}}}, {{min}} and {{count}} can be applied to collections, but the 
> result is returned as a blob. For example:
> {code:java}
> CREATE TABLE t (k int PRIMARY KEY, l list<int>);
> INSERT INTO t(k, l) VALUES (0, [1, 2, 3]);
> INSERT INTO t(k, l) VALUES (1, [10, 20, 30]);
> SELECT max(l) FROM t;
>  system.max(l)
> ------------------------------------------------------------
>  0x00000003000000040000000a0000000400000014000000040000001e
> {code}
> This happens on 3.0, 3.11, 4.0, 4.1 and trunk.
> I'm not sure on whether the function shouldn't be supported for collections, 
> or it should be supported but the result is wrong.
> In the example above, the returned blob is the serialized value of {{{}[10, 
> 20, 30]{}}}, which is the right one according to the list comparator. I think 
> this happens because the matched version of the function is the one for 
> {{{}(blob) -> blob{}}}. We would need a {{(list<int>) -> list<int>}} function 
> instead, but this function doesn't exist.
> It would be quite easy to add versions of the {{{}max{}}}, {{min}} and 
> {{count}} functions for every type of collection ({{{}list<int>{}}}, 
> {{{}list<text>{}}}, {{{}map<int, int>{}}}, {{{}map<int, text>{}}}, etc.). The 
> downside of this approach is that it would increase the number of aggregation 
> functions kept in memory from 82 to 2722, if my maths are right. This is 
> quite an increase, mainly due to the many possible combinations of the 
> {{map}} type. 
> [Here|https://github.com/adelapena/cassandra/commit/e3ba3c2dc36ce58d06942078c708ffb93eb3cd84]
>  is a quick, incomplete prototype of the approach.
> Also, I'm not sure that applying those aggregation functions to collections 
> is very useful in practice. Thus, an alternative approach would be just 
> forbidding them, considering them not supported. I don't think it would be a 
> problem for backward compatibility since no one has complained about the 
> current behaviour, and we might well consider that the original intent was 
> not to allow aggregation on collections. At least, there aren't any tests for 
> it, and I can't find any documentation about it either.
> Another idea that comes to mind is that we could change the meaning of those 
> functions to aggregate the values within the collection, instead of 
> aggregating the rows. In that case, the behaviour would be:
> {code:java}
> CREATE TABLE t (k int PRIMARY KEY, l list<int>);
> INSERT INTO t(k, l) VALUES (0, [1, 2, 3]);
> INSERT INTO t(k, l) VALUES (1, [10, 20, 30]);
> SELECT max(l) FROM t;
>  k | system.max(l)
> ---+-----------
>  1 | 30
>  0 | 3
> {code}
> Of course we could have separate function names for that type of collection 
> aggregations, like {{{}collectionMax{}}}, {{{}maxItem{}}}, or something like 
> that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-17811) CQL aggregation functions on collections, tuples and UDTs

Reply via email to