Re: [PR] [CALCITE-7134] Incorrect type inference for some aggregate functions when groupSets contains '{}' [calcite]

via GitHub Tue, 19 Aug 2025 18:58:03 -0700


silundong commented on code in PR #4499:
URL: https://github.com/apache/calcite/pull/4499#discussion_r2286796987



##########
spark/src/test/java/org/apache/calcite/test/SparkAdapterTest.java:
##########
@@ -144,8 +144,8 @@ private CalciteAssert.AssertQuery sql(String sql) {
     final String plan = "PLAN="
         + "EnumerableCalc(expr#0..4=[{inputs}], expr#5=[CAST($t3):BIGINT NOT 
NULL], proj#0..2=[{exprs}], CNT_Y=[$t5], CNT_DIST_Y=[$t4])\n"
         + "  EnumerableAggregate(group=[{}], SUM_X=[MIN($1) FILTER $6], 
MIN_Y=[MIN($2) FILTER $6], MAX_Y=[MIN($3) FILTER $6], CNT_Y=[MIN($4) FILTER 
$6], CNT_DIST_Y=[COUNT($0) FILTER $5])\n"
-        + "    EnumerableCalc(expr#0..5=[{inputs}], expr#6=[0], expr#7=[=($t5, 
$t6)], expr#8=[1], expr#9=[=($t5, $t8)], proj#0..4=[{exprs}], $g_0=[$t7], 
$g_1=[$t9])\n"
-        + "      EnumerableAggregate(group=[{1}], groups=[[{1}, {}]], 
SUM_X=[$SUM0($0)], MIN_Y=[MIN($1)], MAX_Y=[MAX($1)], CNT_Y=[COUNT()], 
$g=[GROUPING($1)])\n"
+        + "    EnumerableCalc(expr#0..5=[{inputs}], expr#6=[0], expr#7=[=($t2, 
$t6)], expr#8=[null:INTEGER], expr#9=[CASE($t7, $t8, $t1)], expr#10=[=($t5, 
$t6)], expr#11=[1], expr#12=[=($t5, $t11)], Y=[$t0], SUM_X=[$t9], MIN_Y=[$t3], 
MAX_Y=[$t4], CNT_Y=[$t2], $g_0=[$t10], $g_1=[$t12])\n"

Review Comment:
   This is related to two rules: `AggregateExpandDistinctAggregatesRule` and 
`AggregateReduceFunctionsRule`.
   `AggregateExpandDistinctAggregatesRule` will generate two Aggregate, and the 
bottom-Aggregate contains a empty group. The SUM in the original Aggregate 
(SUM_X in SQL) will be split into the SUM in bottom-Aggregate and the MIN in 
top-Aggregate.
   Before this pr, the SUM in bottom-Aggregate will be inferred to not 
nullable. Then in `AggregateReduceFunctionsRule`, SUM will be directly 
converted to SUM0.
   In this pr, the SUM in bottom-Aggregate will be inferred to nullable 
(because of the empty group). Then in `AggregateReduceFunctionsRule`, SUM will 
be converted to CASE WHEN COUNT() == 0 THEN NULL ELSE SUM0. Bottom-Aggregate 
will add SUM0, COUNT to replace SUM, and this COUNT can represent CNT_Y in the 
SQL, so it seems that the position has changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CALCITE-7134] Incorrect type inference for some aggregate functions when groupSets contains '{}' [calcite]

Reply via email to