[ 
https://issues.apache.org/jira/browse/CALCITE-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674547#comment-16674547
 ] 

Julian Hyde commented on CALCITE-2224:
--------------------------------------

Sorry, I missed the SqlValidatorTest cases.

Over the weekend, I reviewed your PR and re-worked it a little. The result is 
in https://github.com/julianhyde/calcite/tree/2224-within-group:
* The argument list of RelBuider.aggregateCall has been growing (we recently 
added approximate and filter, and you added sortKeys). I refactored, moving the 
methods to AggCall (see CALCITE-2654).
* In reference.md, I removed 'WITHIN GROUP' from 'agg\(\*)'. (The only 'agg' 
that allows '*' is 'COUNT', and 'COUNT does not allow 'WITHIN GROUP'.)
* Can you clarify the relationship between SqlOperator.requiresOrder() and 
SqlOperator.allowsOrderedAggregate()? Do the same functions tend to return true 
for both? Which functions have different values? Should we make the names of 
the method more similar?
* Can you clarify the relationship between SqlOperator.allowsOrderedAggregate() 
and SqlAggFunction.ignoreAggregateOrder? Does ignoreAggregateOrder apply to 
windowed aggregates? to functions that SHOULD have order (like RANK) but the 
query does not to specify it? to functions that SHOULD NOT have order (like 
SUM) but where the query does not specify it? I think ignoreAggregateOrder 
should be a property of SqlConformance, not a property of individual functions. 
Or perhaps allowsOrderedAggregate() should return the values MANDATORY, 
OPTIONAL, IGNORED, FORBIDDEN.
* I see that order keys are always provided: never null, sometimes empty. This 
is probably OK because 'WITHIN GROUP' does not allow an empty ORDER BY clause. 
Do think it is likely that we would ever want to distinguish between a missing 
and empty?
* Is it important to wrap LazySource as UnmodifiableIterable? This places an 
overhead every time you call next(). Could be done more efficiently by 
converting the ArrayList to ImmutableList. Or just leave it mutable.
* I don't like that you have added an extra argument to SqlCall. SqlCall is 
extremely widely used and needs to stay very simple. We handled OVER and FILTER 
by adding new operators in SqlStdOperatorTable, and new SqlKind values. Can you 
do the same for WITHIN_GROUP, and remove SqlCall.getAggOrderList()? (I know 
this is a fair amount of work. But expanding SqlCall doesn't scale. If you 
search for uses of SqlKind.FILTER you should be able to do the same thing in 
many cases.)
* Please add a test similar to CALCITE-1910. WITHIN GROUP is vulnerable to the 
same problems as FILTER.

Sorry it's such a long list. Your work is very high quality, and you have done 
almost everything right. With these changes we can commit shortly.

Please do your work on top of 
https://github.com/julianhyde/calcite/tree/2224-within-group.

> WITHIN GROUP clause for aggregate functions
> -------------------------------------------
>
>                 Key: CALCITE-2224
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2224
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>            Priority: Major
>             Fix For: 1.18.0
>
>
> The {{WITHIN GROUP}} clause lets aggregate functions operate on a sorted list 
> of rows, rather than the usual unsorted collection. Order only matters for a 
> few aggregate functions, but we should allow it for all.
> Other analytic functions where {{WITHIN GROUP}} would have an effect: 
> {{RANK}}, {{PERCENT_RANK}}, {{FIRST_VALUE}}, {{LAST_VALUE}}, 
> {{PERCENTILE_CONT}}, {{PERCENTILE_DISC}}.
> {{LISTAGG(value [, separator])}} is an [Oracle 
> function|https://docs.oracle.com/cloud/latest/db112/SQLRF/functions089.htm#SQLRF30030]
>  that concatenates strings. E.g.
> {code:java}
> SELECT LISTAGG(last_name, '; ')
>          WITHIN GROUP (ORDER BY hire_date, last_name)
>   FROM Emp
> GROUP BY deptno{code}
> {{STRING_AGG(value [, separator])}} is a [Microsoft SQL Server function|] 
> that does something similar.
> {{GROUP_CONCAT(value [, separator] [ORDER BY expr [, expr]...)}} is the 
> [MySQL|https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat]
>  equivalent to {{LISTAGG}}. Note the optional {{ORDER BY}} clause within the 
> parentheses.
> {{COLLECT(value)}} is a SQL standard aggregate function that creates 
> multisets. Oracle added a non-standard {{ORDER BY}} clause within the 
> parentheses.
> In my opinion, {{WITHIN GROUP}} should always be optional. {{LISTAGG}} 
> without {{WITHIN GROUP}} would produce non-deterministic output (which is 
> OK); other aggregate functions such as {{MIN}} and {{SUM}} would just ignore 
> {{WITHIN GROUP}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to