Re: [PR] [CALCITE-5465] Rule of AGGREGATE_EXPAND_DISTINCT_AGGREGATES transforms wrong plan, when sql has distinct agg-call with rollup [calcite]

via GitHub Mon, 17 Nov 2025 11:22:46 -0800


mihaibudiu commented on code in PR #4636:
URL: https://github.com/apache/calcite/pull/4636#discussion_r2535226114



##########
core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java:
##########
@@ -425,39 +426,65 @@ private static RelBuilder 
convertSingletonDistinct(RelBuilder relBuilder,
     return relBuilder;
   }
 
+  /**
+   * Rewrite aggregates that use GROUPING SETS. High-level steps:
+   *
+   * <ol>
+   * <li>Build the set of grouping keys (groupSets) required by the rewrite.
+   * This includes: the original groupSet, each grouping-set from the SQL,
+   * and additional combinations needed to expand DISTINCT agg
+   * (i.e. distinct args ∪ grouping set, plus any filter arg).
+   * <li>Compute a "fullGroupSet" (union of all bits) and run a bottom 
aggregate
+   * producing GROUPING(...) and per-group boolean markers.
+   * <li>From the bottom result, build upper aggregates that pick the
+   * appropriate values per original grouping set (using GROUPING id and 
presence indicators).
+   * Finally apply a filter to remove "internal-only" rows that have no 
contributing input.

Review Comment:
   do not contribute to the output?



##########
core/src/test/resources/sql/agg.iq:
##########
@@ -1838,14 +1838,14 @@ group by deptno;
 select count(distinct deptno) as cd, count(*) as c
 from "scott".emp
 group by cube(deptno);
-+----+---+
-| CD | C |
-+----+---+
-|  1 | 3 |
-|  1 | 5 |
-|  1 | 6 |
-|  3 | 3 |
-+----+---+
++----+----+
+| CD | C  |
++----+----+
+|  1 |  3 |
+|  1 |  5 |
+|  1 |  6 |
+|  3 | 14 |

Review Comment:
   it is very unfortunate that we had an incorrect test checked in



##########
core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java:
##########
@@ -425,39 +426,65 @@ private static RelBuilder 
convertSingletonDistinct(RelBuilder relBuilder,
     return relBuilder;
   }
 
+  /**
+   * Rewrite aggregates that use GROUPING SETS. High-level steps:
+   *
+   * <ol>
+   * <li>Build the set of grouping keys (groupSets) required by the rewrite.
+   * This includes: the original groupSet, each grouping-set from the SQL,
+   * and additional combinations needed to expand DISTINCT agg
+   * (i.e. distinct args ∪ grouping set, plus any filter arg).
+   * <li>Compute a "fullGroupSet" (union of all bits) and run a bottom 
aggregate
+   * producing GROUPING(...) and per-group boolean markers.
+   * <li>From the bottom result, build upper aggregates that pick the
+   * appropriate values per original grouping set (using GROUPING id and 
presence indicators).
+   * Finally apply a filter to remove "internal-only" rows that have no 
contributing input.
+   * </ol>
+   */
   private static void rewriteUsingGroupingSets(RelOptRuleCall call,
       Aggregate aggregate) {
+    final ImmutableBitSet aggregateGroupSet = aggregate.getGroupSet();
+    final ImmutableList<ImmutableBitSet> aggregateGroupingSets = 
aggregate.getGroupSets();
+
     final Set<ImmutableBitSet> groupSetTreeSet =
         new TreeSet<>(ImmutableBitSet.ORDERING);
-    // GroupSet to distinct filter arg map,
-    // filterArg will be -1 for non-distinct agg call.
-
-    // Using `Set` here because it's possible that two agg calls
-    // have different filterArgs but same groupSet.
+    // Map from a grouping combo -> which filter args (if any) contributed

Review Comment:
   I am not sure what "combo" is supposed to mean.
   Perhaps "a set of group keys?"



##########
core/src/test/resources/sql/agg.iq:
##########
@@ -4187,4 +4187,115 @@ EnumerableCalc(expr#0..8=[{inputs}], DEPTNO=[$t0], 
ENAME=[$t2], SAL_COL=[$t3], F
       EnumerableTableScan(table=[[scott, EMP]])
 !plan
 
+# [CALCITE-5465] Rule of AGGREGATE_EXPAND_DISTINCT_AGGREGATES transforms wrong 
plan, when sql has distinct agg-call with rollup

Review Comment:
   I have validated all these using postgres.
   could be useful to have this bit of information -- then at least people can 
question tests like the one above which was wrong.



##########
core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java:
##########
@@ -425,39 +426,65 @@ private static RelBuilder 
convertSingletonDistinct(RelBuilder relBuilder,
     return relBuilder;
   }
 
+  /**
+   * Rewrite aggregates that use GROUPING SETS. High-level steps:
+   *
+   * <ol>
+   * <li>Build the set of grouping keys (groupSets) required by the rewrite.
+   * This includes: the original groupSet, each grouping-set from the SQL,
+   * and additional combinations needed to expand DISTINCT agg
+   * (i.e. distinct args ∪ grouping set, plus any filter arg).
+   * <li>Compute a "fullGroupSet" (union of all bits) and run a bottom 
aggregate
+   * producing GROUPING(...) and per-group boolean markers.
+   * <li>From the bottom result, build upper aggregates that pick the
+   * appropriate values per original grouping set (using GROUPING id and 
presence indicators).
+   * Finally apply a filter to remove "internal-only" rows that have no 
contributing input.
+   * </ol>

Review Comment:
   I think this could be much easier to review if there was an example original 
plan (as simple as possible) and an example produced plan after the 
optimization, with a few explanations about the role of each operator used.
   
   Then the code below could just cross-reference the produced plan, "produce 
line 4 from the result plan"



##########
core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java:
##########
@@ -425,39 +426,65 @@ private static RelBuilder 
convertSingletonDistinct(RelBuilder relBuilder,
     return relBuilder;
   }
 
+  /**
+   * Rewrite aggregates that use GROUPING SETS. High-level steps:
+   *
+   * <ol>
+   * <li>Build the set of grouping keys (groupSets) required by the rewrite.
+   * This includes: the original groupSet, each grouping-set from the SQL,
+   * and additional combinations needed to expand DISTINCT agg
+   * (i.e. distinct args ∪ grouping set, plus any filter arg).
+   * <li>Compute a "fullGroupSet" (union of all bits) and run a bottom 
aggregate
+   * producing GROUPING(...) and per-group boolean markers.
+   * <li>From the bottom result, build upper aggregates that pick the
+   * appropriate values per original grouping set (using GROUPING id and 
presence indicators).
+   * Finally apply a filter to remove "internal-only" rows that have no 
contributing input.
+   * </ol>
+   */
   private static void rewriteUsingGroupingSets(RelOptRuleCall call,
       Aggregate aggregate) {
+    final ImmutableBitSet aggregateGroupSet = aggregate.getGroupSet();
+    final ImmutableList<ImmutableBitSet> aggregateGroupingSets = 
aggregate.getGroupSets();
+
     final Set<ImmutableBitSet> groupSetTreeSet =
         new TreeSet<>(ImmutableBitSet.ORDERING);
-    // GroupSet to distinct filter arg map,
-    // filterArg will be -1 for non-distinct agg call.
-
-    // Using `Set` here because it's possible that two agg calls
-    // have different filterArgs but same groupSet.
+    // Map from a grouping combo -> which filter args (if any) contributed
+    // to that combo (used to generate boolean markers later).
     final Map<ImmutableBitSet, Set<Integer>> distinctFilterArgMap = new 
HashMap<>();
+
+    BiConsumer<ImmutableBitSet, Integer> addGroupSet = (groupSet, filterArg) 
-> {
+      groupSetTreeSet.add(groupSet);
+      distinctFilterArgMap.computeIfAbsent(groupSet, g -> new 
HashSet<>()).add(filterArg);
+    };
+
+    // Always include the base group set and each declared grouping set.

Review Comment:
   you have to explain what -1 means here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CALCITE-5465] Rule of AGGREGATE_EXPAND_DISTINCT_AGGREGATES transforms wrong plan, when sql has distinct agg-call with rollup [calcite]

Reply via email to