Alessandro Solimando created CALCITE-7330:
---------------------------------------------

             Summary: AggregateCaseToFilterRule should not be applied on 
aggregate functions that don't ignore NULLs
                 Key: CALCITE-7330
                 URL: https://issues.apache.org/jira/browse/CALCITE-7330
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.41.0
         Environment: +underlined text+
            Reporter: Alessandro Solimando
            Assignee: Alessandro Solimando
             Fix For: 1.42.0


h2. Problem

_AggregateCaseToFilterRule_ incorrectly transforms queries for aggregate 
functions where _NULL_ values have semantic significance.

The rule transforms:
{noformat}
AGG(CASE WHEN condition THEN value ELSE NULL END)
{noformat}
to:
{noformat}
AGG(value) FILTER (WHERE condition)
{noformat}
This transformation is valid for standard SQL aggregates (SUM, AVG, MIN, MAX, 
COUNT) which ignore {_}NULL{_}s.

However, custom user-defined aggregate functions (UDAFs) may treat NULL values 
as semantically significant inputs, making this transformation incorrect (an 
example are Postgres array aggregate functions which are mostly not ignoring 
NULL values, see the 
[doc|https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE]).
 The former passes NULL to the aggregate, the latter filters out those rows 
entirely.

AggregateCall.ignoreNulls() Is Not Enough

At first I thought that the existing _AggregateCall.ignoreNulls()_ could be 
used, but this field represents whether the user explicitly specified _IGNORE 
NULLS_ or _RESPECT NULLS_ syntax in their SQL query, which is only valid for 
window functions, while _AggregateCaseToFilterRule_ operates on _Aggregate_ 
nodes, so it's not modeling what we are interested into here.

My proposal is to add a _SqlAggFunction.ignoresNulls()_ method to express the 
semantic null-handling behavior of aggregate function types:
 - Returns true by default (standard SQL behavior, all built-in aggregates 
ignore NULLs), so *+fully backward compatible+*
 - Custom UDAFs can override to return false to indicate NULLs are semantically 
significant
 - _AggregateCaseToFilterRule_ checks this method before applying the 
transformation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to