[
https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132148#comment-14132148
]
Sylvain Lebresne commented on CASSANDRA-4914:
---------------------------------------------
The general approach lgtm but a bunch of remarks:
* I'm not sure about the sum and avg return type. For sum, using {{bigint}}
whatever the input type is is wrong for at least {{varint}}. We could use
{{varint}} as a return type instead, but I wonder if it's really that
convenient. Typically, in the java driver, {{varint}} is mapped to
{{BigInteger}} and I'm not sure always returning that even when summing {{int}}
is what people wants in general. What I suggest is to have {{sum}} and {{avg}}
method always return the same type than their argument, and that we allow users
to type-cast the argument if they want a greater precision that the input type
(from a quick look at their doc, that seems to be what oracle does, haven't
checked other).
* I'd rather get rid of the specific code for {{COUNT(*)}} in SelectStatement.
It sounds to me that we could just have a specific no-arg function for that (we
could still have some special casing in the parser).
* In {{ScalarFunctionSelector.isAggregate}}, we shouldn't assume that the
function has one argument, it can have none (this is currently breaking a
UFTest unit test). Also, I believe something like:
{noformat}
SELECT max(a), add(max(b), c) FROM ..
{noformat}
is not rejected (because {{ScalarFunctionSelector}} only even check it's
first argument). I'll note though that it could make sense to allow:
{noformat}
SELECT max(a), log(max(b), 2) FROM ..
{noformat}
so maybe we should (not that it's a big deal if we don't, but unless I'm
wrong, it's reasonably trivial to so ...). As an aside, a comment on why
{{ScalarFunctionSector}} don't just return true for {{isAggregate}} could be
useful here.
* Since Selector has a reset() method, why not use that to reset the
{{current}} value for {{SimpleSelection}} and others?
* I'd have a preference for making ScalarFunction/AggregateFunction be abstract
classes and have them implement isAggregate (instead of having the
NativeScalarFunction/NativeAggretationFunction).
* Needs to switch a 'assertInvalid' to 'assertInvalidSyntax' in
{{AggregationTest.testInvalidCalls}}.
A couple of nitpicks too:
* There is a typo in the error message in
{{AggregateFunctionSelector.validateArguments}}.
* Regarding code style, we don't put @Override when implementing an interface
(AggregateFcts is full of them).
* I believe {{Selection.reset()}} is unused. Maybe we can remove it for now
until we actually need it.
* I'd shorten {{SelectionWithFunctions.validateSelectors}} to something like
{noformat}
int aggregations = 0;
for (Select s : selectors)
if (s.isAggregate())
++aggregations;
if (aggregations != 0 && aggregations != selectors.size())
throw new InvalidRequestException("the select clause must either contains
only aggregations or none");
{noformat}
Feels like too many ifs for what it does currently.
> Aggregation functions in CQL
> ----------------------------
>
> Key: CASSANDRA-4914
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4914
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Vijay
> Assignee: Benjamin Lerer
> Labels: cql, docs
> Fix For: 3.0
>
> Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914.txt
>
>
> The requirement is to do aggregation of data in Cassandra (Wide row of column
> values of int, double, float etc).
> With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for
> the columns within a row).
> Example:
> SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC;
>
> empid | deptid | first_name | last_name | salary
> -------+--------+------------+-----------+--------
> 130 | 3 | joe | doe | 10.1
> 130 | 2 | joe | doe | 100
> 130 | 1 | joe | doe | 1e+03
>
> SELECT sum(salary), empid FROM emp WHERE empID IN (130);
>
> sum(salary) | empid
> -------------+--------
> 1110.1 | 130
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)