Re: [PR] [CALCITE-5597] SELECT DISTINCT query with ORDER BY column will get error result error result [calcite]

via GitHub Thu, 08 Jan 2026 10:08:31 -0800


mihaibudiu commented on code in PR #4722:
URL: https://github.com/apache/calcite/pull/4722#discussion_r2673346461



##########
core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java:
##########
@@ -879,10 +916,23 @@ private void distinctify(
         }
       }
       if (dupCount == 0) {
-        distinctify(bb, false);
+        distinctify(bb, false, groupCount);
         return;
       }
 
+      // Calculate the number of unique grouping columns in the SELECT list.

Review Comment:
   Can't this loop be combined with the one above?
   It may be even easier to read.



##########
core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java:
##########
@@ -5986,4 +5986,12 @@ void checkUserDefinedOrderByOver(NullCollation 
nullCollation) {
         + "and t1.ename in (select t3.ename from emp t3 )";
     sql(sql).ok();
   }
+
+  /** Test case of
+   * <a 
href="https://issues.apache.org/jira/browse/CALCITE-5597";>[CALCITE-5597]
+   * SELECT DISTINCT query with ORDER BY column will get error result</a>. */
+  @Test void testDistinctOrderByRand() {

Review Comment:
   Please add a test where some columns have constant values; Calcite 
represents constants in the select list in a special way, and it may interfere 
with the index computations you have.



##########
core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java:
##########
@@ -921,12 +971,78 @@ private void distinctify(
     }
 
     // Usual case: all expressions in the SELECT clause are different.
-    final ImmutableBitSet groupSet =
-        ImmutableBitSet.range(rel.getRowType().getFieldCount());
+    final int totalFieldCount = rel.getRowType().getFieldCount();
+    final Project project = rel instanceof Project ? (Project) rel : null;
+    final ImmutableBitSet.Builder groupSetBuilder = ImmutableBitSet.builder();
+    // The first groupCount columns are the columns in the user's SELECT list
+    // and must be part of the group set.
+    for (int i = 0; i < groupCount; i++) {
+      groupSetBuilder.set(i);
+    }
+    // For ORDER BY columns, only deterministic expressions can be in the 
group set.
+    // Non-deterministic functions like RAND() would make every row unique,
+    // breaking DISTINCT semantics (e.g., SELECT DISTINCT deptno ... ORDER BY 
RAND()).
+    for (int i = groupCount; i < totalFieldCount; i++) {
+      RexNode expr = project != null
+          ? project.getProjects().get(i) : null;
+      if (expr == null || RexUtil.isDeterministic(expr)) {
+        groupSetBuilder.set(i);
+      }
+    }
+    final ImmutableBitSet groupSet = groupSetBuilder.build();
+    ImmutableBitSet aggregateGroupSet = groupSet;
+    if (groupSet.cardinality() < totalFieldCount && project != null) {
+      // Trim the input project to remove unused non-deterministic expressions.
+      // This avoids evaluating them twice (once below the aggregate and once 
above).
+      final List<RexNode> newBottomProjects = new ArrayList<>();
+      final List<String> newBottomNames = new ArrayList<>();
+      for (int i = 0; i < totalFieldCount; i++) {
+        if (groupSet.get(i)) {
+          newBottomProjects.add(project.getProjects().get(i));
+          newBottomNames.add(rel.getRowType().getFieldNames().get(i));
+        }
+      }
+      RelNode inputRel =
+          LogicalProject.create(project.getInput(), project.getHints(),
+              newBottomProjects, newBottomNames, project.getVariablesSet());
+      bb.setRoot(inputRel, false);
+      aggregateGroupSet = ImmutableBitSet.range(newBottomProjects.size());
+    }
+
     bb.setRoot(
-        createAggregate(bb, groupSet, ImmutableList.of(groupSet),
+        createAggregate(bb, aggregateGroupSet, 
ImmutableList.of(aggregateGroupSet),
             ImmutableList.of()),
         false);
+
+    if (groupSet.cardinality() < totalFieldCount) {
+      final RelNode aggregate = bb.root();
+      final List<RexNode> newProjects = new ArrayList<>();
+      final List<String> newNames = rel.getRowType().getFieldNames();
+      final RexShuttle shuttle = new RexShuttle() {
+        @Override public RexNode visitInputRef(RexInputRef inputRef) {
+          int newIdx = groupSet.indexOf(inputRef.getIndex());
+          if (newIdx >= 0) {
+            return rexBuilder.makeInputRef(aggregate, newIdx);
+          }
+          return super.visitInputRef(inputRef);
+        }
+      };
+      for (int i = 0; i < totalFieldCount; i++) {
+        int newIdx = groupSet.indexOf(i);
+        if (newIdx >= 0) {
+          newProjects.add(rexBuilder.makeInputRef(aggregate, newIdx));
+        } else {
+          // If we are here, it means this column was excluded from the 
groupSet
+          // because it was non-deterministic. Such columns must come from a 
Project.

Review Comment:
   I wonder whether it wouldn't be clearer to just create an HashMap with these 
columns and fill it in the first loop above. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CALCITE-5597] SELECT DISTINCT query with ORDER BY column will get error result error result [calcite]

Reply via email to