Re: [PR] [CALCITE-5597] SELECT DISTINCT query with ORDER BY column will get error result [calcite]

via GitHub Fri, 09 Jan 2026 11:42:31 -0800


mihaibudiu commented on code in PR #4733:
URL: https://github.com/apache/calcite/pull/4733#discussion_r2677351924



##########
core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java:
##########
@@ -864,69 +870,155 @@ private void distinctify(
       throw new IllegalArgumentException("rel must not be null");
     }
     final RelNode rel = bb.root;
+    int groupCount = rel.getRowType().getFieldCount();
+    if (bb.scope != null && bb.scope.getNode() instanceof SqlSelect) {
+      groupCount = 
validator().getValidatedNodeType(bb.scope.getNode()).getFieldCount();
+    }
+    distinctify(bb, checkForDupExprs, groupCount);
+  }
+
+  /**
+   * Having translated 'SELECT ... FROM ... [GROUP BY ...] [HAVING ...]', adds

Review Comment:
   The translation of ... uses an Aggregate.



##########
core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java:
##########
@@ -5994,4 +5994,12 @@ void checkUserDefinedOrderByOver(NullCollation 
nullCollation) {
         + "and t1.ename in (select t3.ename from emp t3 )";
     sql(sql).ok();
   }
+
+  /** Test case of
+   * <a 
href="https://issues.apache.org/jira/browse/CALCITE-5597";>[CALCITE-5597]
+   * SELECT DISTINCT query with ORDER BY column will get error result</a>. */
+  @Test void testDistinctOrderByRand() {
+    final String sql = "select distinct deptno, empno from emp order by 
rand(), 1";

Review Comment:
   Can you add a constant to the list of selected fields: `SELECT DISTINCT 1, 
...`
   That's what I meant in my prior comment.
   



##########
core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java:
##########
@@ -864,69 +870,155 @@ private void distinctify(
       throw new IllegalArgumentException("rel must not be null");
     }
     final RelNode rel = bb.root;
+    int groupCount = rel.getRowType().getFieldCount();
+    if (bb.scope != null && bb.scope.getNode() instanceof SqlSelect) {
+      groupCount = 
validator().getValidatedNodeType(bb.scope.getNode()).getFieldCount();
+    }
+    distinctify(bb, checkForDupExprs, groupCount);
+  }
+
+  /**
+   * Having translated 'SELECT ... FROM ... [GROUP BY ...] [HAVING ...]', adds
+   * an {@link org.apache.calcite.rel.core.Aggregate} to make the results 
unique.
+   *
+   * <p>For example, {@code SELECT DISTINCT x FROM t ORDER BY y} is converted
+   * to an {@code Aggregate} on {@code (x, y)} if {@code y} is deterministic.
+   * If {@code y} is non-deterministic (e.g. {@code RAND()}), it is converted
+   * to an {@code Aggregate} on {@code x}, and {@code y} is applied over the
+   * result.
+   *
+   * <p>If the SELECT clause contains duplicate expressions, adds
+   * {@link org.apache.calcite.rel.logical.LogicalProject}s so that we are
+   * grouping on the minimal set of keys. The performance gain isn't huge, but
+   * it is difficult to detect these duplicate expressions later.
+   *
+   * @param bb               Blackboard
+   * @param checkForDupExprs Check for duplicate expressions
+   * @param groupCount       Number of fields in the SELECT clause
+   */
+  private void distinctify(
+      Blackboard bb,
+      boolean checkForDupExprs,
+      int groupCount) {
+    if (bb.root == null) {
+      throw new IllegalArgumentException("rel must not be null");
+    }
+    RelNode rel = bb.root;
+
+    // 1. Handle duplicate expressions in the Project if requested.
     if (checkForDupExprs && (rel instanceof LogicalProject)) {
-      LogicalProject project = (LogicalProject) rel;
+      final LogicalProject project = (LogicalProject) rel;
       final List<RexNode> projectExprs = project.getProjects();
       final List<Integer> origins = new ArrayList<>();
-      int dupCount = 0;
+      final Map<RexNode, Integer> seen = new HashMap<>();
       for (int i = 0; i < projectExprs.size(); i++) {
-        int x = projectExprs.indexOf(projectExprs.get(i));
-        if (x >= 0 && x < i) {
-          origins.add(x);
-          ++dupCount;
-        } else {
-          origins.add(i);
+        Integer first = seen.putIfAbsent(projectExprs.get(i), i);
+        origins.add(first != null ? first : i);
+      }
+
+      if (seen.size() < projectExprs.size()) {
+        final List<RelDataTypeField> fields = rel.getRowType().getFieldList();
+        final PairList<RexNode, String> newProjects = PairList.of();
+        final List<Integer> mapping = new ArrayList<>();
+        for (int i = 0; i < fields.size(); i++) {
+          if (origins.get(i) == i) {
+            mapping.add(newProjects.size());
+            RexInputRef.add2(newProjects, i, fields);
+          } else {
+            mapping.add(-1);
+          }
         }
-      }
-      if (dupCount == 0) {
-        distinctify(bb, false);
+        bb.setRoot(
+            LogicalProject.create(rel, ImmutableList.of(),
+                newProjects.leftList(), newProjects.rightList(),
+                project.getVariablesSet()), false);
+
+        int newGroupCount = 0;
+        for (int i = 0; i < groupCount; i++) {
+          if (origins.get(i) == i) {

Review Comment:
   this looks strange, can't origin contain values larger than groupCount?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CALCITE-5597] SELECT DISTINCT query with ORDER BY column will get error result [calcite]

Reply via email to