[
https://issues.apache.org/jira/browse/CALCITE-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674547#comment-16674547
]
Julian Hyde commented on CALCITE-2224:
--------------------------------------
Sorry, I missed the SqlValidatorTest cases.
Over the weekend, I reviewed your PR and re-worked it a little. The result is
in https://github.com/julianhyde/calcite/tree/2224-within-group:
* The argument list of RelBuider.aggregateCall has been growing (we recently
added approximate and filter, and you added sortKeys). I refactored, moving the
methods to AggCall (see CALCITE-2654).
* In reference.md, I removed 'WITHIN GROUP' from 'agg\(\*)'. (The only 'agg'
that allows '*' is 'COUNT', and 'COUNT does not allow 'WITHIN GROUP'.)
* Can you clarify the relationship between SqlOperator.requiresOrder() and
SqlOperator.allowsOrderedAggregate()? Do the same functions tend to return true
for both? Which functions have different values? Should we make the names of
the method more similar?
* Can you clarify the relationship between SqlOperator.allowsOrderedAggregate()
and SqlAggFunction.ignoreAggregateOrder? Does ignoreAggregateOrder apply to
windowed aggregates? to functions that SHOULD have order (like RANK) but the
query does not to specify it? to functions that SHOULD NOT have order (like
SUM) but where the query does not specify it? I think ignoreAggregateOrder
should be a property of SqlConformance, not a property of individual functions.
Or perhaps allowsOrderedAggregate() should return the values MANDATORY,
OPTIONAL, IGNORED, FORBIDDEN.
* I see that order keys are always provided: never null, sometimes empty. This
is probably OK because 'WITHIN GROUP' does not allow an empty ORDER BY clause.
Do think it is likely that we would ever want to distinguish between a missing
and empty?
* Is it important to wrap LazySource as UnmodifiableIterable? This places an
overhead every time you call next(). Could be done more efficiently by
converting the ArrayList to ImmutableList. Or just leave it mutable.
* I don't like that you have added an extra argument to SqlCall. SqlCall is
extremely widely used and needs to stay very simple. We handled OVER and FILTER
by adding new operators in SqlStdOperatorTable, and new SqlKind values. Can you
do the same for WITHIN_GROUP, and remove SqlCall.getAggOrderList()? (I know
this is a fair amount of work. But expanding SqlCall doesn't scale. If you
search for uses of SqlKind.FILTER you should be able to do the same thing in
many cases.)
* Please add a test similar to CALCITE-1910. WITHIN GROUP is vulnerable to the
same problems as FILTER.
Sorry it's such a long list. Your work is very high quality, and you have done
almost everything right. With these changes we can commit shortly.
Please do your work on top of
https://github.com/julianhyde/calcite/tree/2224-within-group.
> WITHIN GROUP clause for aggregate functions
> -------------------------------------------
>
> Key: CALCITE-2224
> URL: https://issues.apache.org/jira/browse/CALCITE-2224
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Assignee: Julian Hyde
> Priority: Major
> Fix For: 1.18.0
>
>
> The {{WITHIN GROUP}} clause lets aggregate functions operate on a sorted list
> of rows, rather than the usual unsorted collection. Order only matters for a
> few aggregate functions, but we should allow it for all.
> Other analytic functions where {{WITHIN GROUP}} would have an effect:
> {{RANK}}, {{PERCENT_RANK}}, {{FIRST_VALUE}}, {{LAST_VALUE}},
> {{PERCENTILE_CONT}}, {{PERCENTILE_DISC}}.
> {{LISTAGG(value [, separator])}} is an [Oracle
> function|https://docs.oracle.com/cloud/latest/db112/SQLRF/functions089.htm#SQLRF30030]
> that concatenates strings. E.g.
> {code:java}
> SELECT LISTAGG(last_name, '; ')
> WITHIN GROUP (ORDER BY hire_date, last_name)
> FROM Emp
> GROUP BY deptno{code}
> {{STRING_AGG(value [, separator])}} is a [Microsoft SQL Server function|]
> that does something similar.
> {{GROUP_CONCAT(value [, separator] [ORDER BY expr [, expr]...)}} is the
> [MySQL|https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat]
> equivalent to {{LISTAGG}}. Note the optional {{ORDER BY}} clause within the
> parentheses.
> {{COLLECT(value)}} is a SQL standard aggregate function that creates
> multisets. Oracle added a non-standard {{ORDER BY}} clause within the
> parentheses.
> In my opinion, {{WITHIN GROUP}} should always be optional. {{LISTAGG}}
> without {{WITHIN GROUP}} would produce non-deterministic output (which is
> OK); other aggregate functions such as {{MIN}} and {{SUM}} would just ignore
> {{WITHIN GROUP}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)