[ 
https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132148#comment-14132148
 ] 

Sylvain Lebresne commented on CASSANDRA-4914:
---------------------------------------------

The general approach lgtm but a bunch of remarks:
* I'm not sure about the sum and avg return type. For sum, using {{bigint}} 
whatever the input type is is wrong for at least {{varint}}. We could use 
{{varint}} as a return type instead, but I wonder if it's really that 
convenient. Typically, in the java driver, {{varint}} is mapped to 
{{BigInteger}} and I'm not sure always returning that even when summing {{int}} 
is what people wants in general. What I suggest is to have {{sum}} and {{avg}} 
method always return the same type than their argument, and that we allow users 
to type-cast the argument if they want a greater precision that the input type 
(from a quick look at their doc, that seems to be what oracle does, haven't 
checked other).
* I'd rather get rid of the specific code for {{COUNT(*)}} in SelectStatement. 
It sounds to me that we could just have a specific no-arg function for that (we 
could still have some special casing in the parser).
* In {{ScalarFunctionSelector.isAggregate}}, we shouldn't assume that the 
function has one argument, it can have none (this is currently breaking a 
UFTest unit test). Also, I believe something like:
  {noformat}
  SELECT max(a), add(max(b), c)  FROM ..
  {noformat}
  is not rejected (because {{ScalarFunctionSelector}} only even check it's 
first argument). I'll note though that it could make sense to allow:
  {noformat}
  SELECT max(a), log(max(b), 2)  FROM ..
  {noformat}
  so maybe we should (not that it's a big deal if we don't, but unless I'm 
wrong, it's reasonably trivial to so ...).  As an aside, a comment on why 
{{ScalarFunctionSector}} don't just return true for {{isAggregate}} could be 
useful here.
* Since Selector has a reset() method, why not use that to reset the 
{{current}} value for {{SimpleSelection}} and others?
* I'd have a preference for making ScalarFunction/AggregateFunction be abstract 
classes and have them implement isAggregate (instead of having the 
NativeScalarFunction/NativeAggretationFunction).  
* Needs to switch a 'assertInvalid' to 'assertInvalidSyntax' in 
{{AggregationTest.testInvalidCalls}}.

A couple of nitpicks too:
* There is a typo in the error message in 
{{AggregateFunctionSelector.validateArguments}}.
* Regarding code style, we don't put @Override when implementing an interface 
(AggregateFcts is full of them).
* I believe {{Selection.reset()}} is unused. Maybe we can remove it for now 
until we actually need it.
* I'd shorten {{SelectionWithFunctions.validateSelectors}} to something like
  {noformat}
  int aggregations = 0;
  for (Select s : selectors)
      if (s.isAggregate())
          ++aggregations;

  if (aggregations != 0 && aggregations != selectors.size())
      throw new InvalidRequestException("the select clause must either contains 
only aggregations or none");
  {noformat}
  Feels like too many ifs for what it does currently.


> Aggregation functions in CQL
> ----------------------------
>
>                 Key: CASSANDRA-4914
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Vijay
>            Assignee: Benjamin Lerer
>              Labels: cql, docs
>             Fix For: 3.0
>
>         Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914.txt
>
>
> The requirement is to do aggregation of data in Cassandra (Wide row of column 
> values of int, double, float etc).
> With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for 
> the columns within a row).
> Example:
> SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC;                  
>                   
>  empid | deptid | first_name | last_name | salary
> -------+--------+------------+-----------+--------
>    130 |      3 |     joe    |     doe   |   10.1
>    130 |      2 |     joe    |     doe   |    100
>    130 |      1 |     joe    |     doe   |  1e+03
>  
> SELECT sum(salary), empid FROM emp WHERE empID IN (130);                      
>               
>  sum(salary) | empid
> -------------+--------
>    1110.1    |  130



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to