[GitHub] [druid] clintropolis opened a new pull request, #12880: fix issue with SQL sum aggregator due to bug with DruidTypeSystem and AggregateRemoveRule

GitBox Mon, 08 Aug 2022 19:19:29 -0700


clintropolis opened a new pull request, #12880:
URL: https://github.com/apache/druid/pull/12880


   ### Description
   This PR fixes an issue that occurs when using a `SUM` function in sub-query 
that is a result of what I believe to be a bug in Calcites 
`SqlSplittableAggFunction.AbstractSumSplitter`. Queries impacted will be of 
this form and involve INTEGER (or FLOAT) literals:
   
   ```
   SELECT
     dim1,
     SUM(CASE WHEN sum_l1 = 0 THEN 1 ELSE 0 END) AS outer_l1
   from (
     select
       dim1,
       SUM(l1) as sum_l1
     from numfoo
     group by dim1
   )
   group by 1
   ```
   
   The problem arises because `RelDataTypeSystem` implementations are allowed 
to define the built-in `SUM` function return type inference via 
[`deriveSumType`](https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidTypeSystem.java#L102)
 which we have done, however 
[`AggregateReduceRule`](https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/rules/AggregateRemoveRule.java#L112)
 attempts to simplify the query using `SqlSplittableAggFunction.singleton`, 
which "Generates an expression for the value of the aggregate function when 
applied to a single row." in this case, to compare the subquery signature to 
the outer signature to see if it can remove the outer aggregate (and eventually 
collapse the outer and inner query).
   
   The default [implementation of this for 
`SUM`](https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlSplittableAggFunction.java#L263)
 however just passes through the argument as the output, which in this case is 
an `INTEGER` literal, instead of a `BIGINT` which the Druid type system would 
have chosen if it was consulted. This is the part I believe to be a bug, and 
for my fix I have duplicated `SqlSumAggFunction` as `DruidSumAggFunction` and 
overwritten `unwrap` to use a new `DruidSumSplitter` which checks to see if the 
type of the `aggregateCall` is the same as the type of the argument, otherwise 
it will wrap the identifier in a cast expression to ensure it is the correct 
type.
   
   Without this, the query planning fails with an error like
   ```
   java.lang.AssertionError: Type mismatch:
   rowtype of new rel:
   RecordType(VARCHAR dim2, BIGINT NOT NULL outer_l2) NOT NULL
   rowtype of set:
   RecordType(VARCHAR dim2, INTEGER NOT NULL $f1) NOT NULL
   ```
   
   I will attempt to make an upstream patch to make the changes here 
unnecessary at some point, but this should provide a workaround for now.
   
   <hr>
   
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] clintropolis opened a new pull request, #12880: fix issue with SQL sum aggregator due to bug with DruidTypeSystem and AggregateRemoveRule

Reply via email to