[
https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467360#comment-16467360
]
ASF GitHub Bot commented on DRILL-5913:
---------------------------------------
weijietong commented on issue #1016: DRILL-5913:solve the mixed processing of
same functions with same inputRefs but di…
URL: https://github.com/apache/drill/pull/1016#issuecomment-387387738
@vvysotskyi I have tested the jira issue sql on the current master and it
passed .But another new test case:
```
@Test
public void testDRILL5913_t() throws Exception {
test("select n_nationkey, stddev((case when ( bigint_col ) >0 then 1
else 0 end)) * 1.0 as col1, avg((case when ( bigint_col) >0 then 1 else 0
end)) * 1.0 as col2 from "
+ "( select n_name,n_nationkey, sum( n_regionkey) as bigint_col
from cp.`tpch/nation.parquet` group by n_name,n_nationkey ) t group by
n_nationkey");
}
```
will throw another Exception at Drill version 1.13 with Calcite version 1.15
but passed at current master. The exception message is:
```
Caused by: java.lang.AssertionError: Type mismatch:
rel rowtype:
RecordType(ANY n_nationkey, BIGINT $f1, BIGINT $f2, BIGINT NOT NULL $f3,
BIGINT $f4) NOT NULL
equivRel rowtype:
RecordType(ANY n_nationkey, BIGINT $f1, BIGINT $f2, BIGINT NOT NULL $f3,
BIGINT NOT NULL $f4) NOT NULL
```
All of the main reason is that DrillReduceAggregationRule.reduceAgg invoked
RexBuilder.addAggCall method whose parameter aggCallMapping acts as a AggCall
cache. The aggCallMapping cache only care about the call name not the data
type. The current master code of Calcite does nothing about this part since I
find this bug. I don't think I can exhaustive all the test cases to prove our
current master implementation right. But it seems security to have my tuned
part of codes (validating AggCall cache with data type) to the master to
prevent any future possible issues.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> DrillReduceAggregatesRule mixed the same functions of the same inputRef which
> have different dataTypes
> -------------------------------------------------------------------------------------------------------
>
> Key: DRILL-5913
> URL: https://issues.apache.org/jira/browse/DRILL-5913
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.9.0, 1.11.0
> Reporter: weijie.tong
> Priority: Major
>
> sample query:
> {code:java}
> select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as
> int)) as col2 from cp.`employee.json`
> {code}
> error info:
> {code:java}
> org.apache.drill.exec.rpc.RpcException:
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> AssertionError: Type mismatch:
> rel rowtype:
> RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT
> NULL
> equivRel rowtype:
> RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL
> [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010]
> (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception
> during fragment initialization: Internal error: Error while applying rule
> DrillReduceAggregatesRule, args
> [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
> org.apache.drill.exec.work.foreman.Foreman.run():294
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
> Caused By (java.lang.AssertionError) Internal error: Error while applying
> rule DrillReduceAggregatesRule, args
> [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
> org.apache.calcite.util.Util.newInternal():792
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811
> {code}
> The reason is that stddev_samp(cast(employee_id as int)) will be reduced as
> sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be
> reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching.
> The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too
> . But this sum0($0) 's data type is different from the first time's sum0($0)
> : one is integer ,the other is bigint . But Calcite's addAggCall method treat
> them as the same by ignoring their data type. This leads to the bigint
> sum0($0) be replaced by the integer sum0($0).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)