Re: [PR] [CALCITE-5597] SELECT DISTINCT query with ORDER BY column will get error result [calcite]

via GitHub Fri, 09 Jan 2026 16:35:37 -0800


xiedeyantu commented on code in PR #4733:
URL: https://github.com/apache/calcite/pull/4733#discussion_r2678012745



##########
core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java:
##########
@@ -864,69 +870,155 @@ private void distinctify(
       throw new IllegalArgumentException("rel must not be null");
     }
     final RelNode rel = bb.root;
+    int groupCount = rel.getRowType().getFieldCount();
+    if (bb.scope != null && bb.scope.getNode() instanceof SqlSelect) {
+      groupCount = 
validator().getValidatedNodeType(bb.scope.getNode()).getFieldCount();
+    }
+    distinctify(bb, checkForDupExprs, groupCount);
+  }
+
+  /**
+   * Having translated 'SELECT ... FROM ... [GROUP BY ...] [HAVING ...]', adds
+   * an {@link org.apache.calcite.rel.core.Aggregate} to make the results 
unique.
+   *
+   * <p>For example, {@code SELECT DISTINCT x FROM t ORDER BY y} is converted
+   * to an {@code Aggregate} on {@code (x, y)} if {@code y} is deterministic.
+   * If {@code y} is non-deterministic (e.g. {@code RAND()}), it is converted
+   * to an {@code Aggregate} on {@code x}, and {@code y} is applied over the
+   * result.
+   *
+   * <p>If the SELECT clause contains duplicate expressions, adds
+   * {@link org.apache.calcite.rel.logical.LogicalProject}s so that we are
+   * grouping on the minimal set of keys. The performance gain isn't huge, but
+   * it is difficult to detect these duplicate expressions later.
+   *
+   * @param bb               Blackboard
+   * @param checkForDupExprs Check for duplicate expressions
+   * @param groupCount       Number of fields in the SELECT clause
+   */
+  private void distinctify(
+      Blackboard bb,
+      boolean checkForDupExprs,
+      int groupCount) {
+    if (bb.root == null) {
+      throw new IllegalArgumentException("rel must not be null");
+    }
+    RelNode rel = bb.root;
+
+    // 1. Handle duplicate expressions in the Project if requested.
     if (checkForDupExprs && (rel instanceof LogicalProject)) {
-      LogicalProject project = (LogicalProject) rel;
+      final LogicalProject project = (LogicalProject) rel;
       final List<RexNode> projectExprs = project.getProjects();
       final List<Integer> origins = new ArrayList<>();
-      int dupCount = 0;
+      final Map<RexNode, Integer> seen = new HashMap<>();
       for (int i = 0; i < projectExprs.size(); i++) {
-        int x = projectExprs.indexOf(projectExprs.get(i));
-        if (x >= 0 && x < i) {
-          origins.add(x);
-          ++dupCount;
-        } else {
-          origins.add(i);
+        Integer first = seen.putIfAbsent(projectExprs.get(i), i);
+        origins.add(first != null ? first : i);
+      }
+
+      if (seen.size() < projectExprs.size()) {
+        final List<RelDataTypeField> fields = rel.getRowType().getFieldList();
+        final PairList<RexNode, String> newProjects = PairList.of();
+        final List<Integer> mapping = new ArrayList<>();
+        for (int i = 0; i < fields.size(); i++) {
+          if (origins.get(i) == i) {
+            mapping.add(newProjects.size());
+            RexInputRef.add2(newProjects, i, fields);
+          } else {
+            mapping.add(-1);
+          }
         }
-      }
-      if (dupCount == 0) {
-        distinctify(bb, false);
+        bb.setRoot(
+            LogicalProject.create(rel, ImmutableList.of(),
+                newProjects.leftList(), newProjects.rightList(),
+                project.getVariablesSet()), false);
+
+        int newGroupCount = 0;
+        for (int i = 0; i < groupCount; i++) {
+          if (origins.get(i) == i) {

Review Comment:
   In the current SELECT DISTINCT scenario, all columns are placed in the 
groupSet when the AGG is initially constructed, so this is true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CALCITE-5597] SELECT DISTINCT query with ORDER BY column will get error result [calcite]

Reply via email to