Re: [PR] [fix](nereids) allow constant output column not in GROUP BY under only_full_group_by [doris]

via GitHub Fri, 12 Jun 2026 02:17:53 -0700


github-actions[bot] commented on code in PR #64458:
URL: https://github.com/apache/doris/pull/64458#discussion_r3402101106



##########
regression-test/suites/nereids_syntax_p0/test_group_by_constant_output.groovy:
##########
@@ -0,0 +1,51 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_group_by_constant_output") {
+    sql "SET enable_nereids_planner=true"
+    sql "SET enable_fallback_to_original_planner=false"
+
+    // The output alias 'b' collides with the derived-table column 'b', so 
under only_full_group_by the
+    // column 'b' is grouped and the constant column 'a' is left ungrouped. 
Because 'a' is constant for
+    // every input row it is accepted (MySQL functional-dependency behavior), 
and the query returns (1, 2).
+    def r1 = sql "SELECT a as b, b as c FROM (SELECT 1 as a, 2 as b) t1 GROUP 
BY b, c"
+    assertEquals(1, r1.size())
+    assertEquals(1, (r1[0][0] as int))
+    assertEquals(2, (r1[0][1] as int))
+
+    // A constant column not present in GROUP BY is allowed even though it is 
neither grouped nor aggregated.
+    def r2 = sql "SELECT a, b FROM (SELECT 1 as a, 2 as b) t1 GROUP BY a"
+    assertEquals(1, r2.size())
+    assertEquals(1, (r2[0][0] as int))
+    assertEquals(2, (r2[0][1] as int))
+
+    // The same shape over a real table leaves a non-constant column 
ungrouped, which is still rejected
+    // (matching MySQL, where only constant / functionally-dependent columns 
are allowed).
+    sql "DROP TABLE IF EXISTS test_gb_const_t"
+    sql """
+        CREATE TABLE test_gb_const_t (a INT, b INT) ENGINE=OLAP
+            DUPLICATE KEY(a)
+            DISTRIBUTED BY HASH(a) BUCKETS 1
+        PROPERTIES ('replication_num' = '1')
+    """
+    test {
+        sql "SELECT a as b, b as c FROM test_gb_const_t GROUP BY b, c"
+        exception "must appear in the GROUP BY"
+    }
+
+    sql "DROP TABLE IF EXISTS test_gb_const_t"

Review Comment:
   Please do not drop the regression table at the end. The repository test 
rules require dropping before use and preserving the final table state for 
debugging after a failure. The `DROP TABLE IF EXISTS` before create is enough; 
remove this final drop.



##########
regression-test/suites/nereids_syntax_p0/test_group_by_constant_output.groovy:
##########
@@ -0,0 +1,51 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_group_by_constant_output") {
+    sql "SET enable_nereids_planner=true"
+    sql "SET enable_fallback_to_original_planner=false"
+
+    // The output alias 'b' collides with the derived-table column 'b', so 
under only_full_group_by the
+    // column 'b' is grouped and the constant column 'a' is left ungrouped. 
Because 'a' is constant for
+    // every input row it is accepted (MySQL functional-dependency behavior), 
and the query returns (1, 2).
+    def r1 = sql "SELECT a as b, b as c FROM (SELECT 1 as a, 2 as b) t1 GROUP 
BY b, c"
+    assertEquals(1, r1.size())

Review Comment:
   The added regression test is checking deterministic result rows with 
`assertEquals`. The repository regression-test rules require determined 
expected results to be generated via `qt_sql`/`order_qt_*` and the generated 
`.out` file, rather than manual assertions. Please convert these positive cases 
to query tests with deterministic ordering as needed.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/NormalizeAggregate.java:
##########
@@ -300,37 +300,51 @@ public LogicalPlan normalizeAgg(LogicalAggregate<Plan> 
aggregate, Optional<Logic
             }
             if (!missingSlotsInAggregate.isEmpty()) {
                 if (SqlModeHelper.hasOnlyFullGroupBy()) {
-                    throw new AnalysisException(String.format("PROJECT 
expression %s must appear in the GROUP BY"
-                            + " clause or be used in an aggregate function",
-                            missingSlotsInAggregate.stream()
-                                    .map(slot -> "'" + slot.getName() + "'")
-                                    .collect(Collectors.joining(", "))));
-                } else {
-                    // for any slots missing in aggregate's output, we should 
add a any_value(slot) into
-                    // aggregate's output list and slot itself into bottom 
project's output list
-                    bottomProjects = Sets.union(bottomProjects, 
missingSlotsInAggregate);
-                    Map<Expression, Expression> replaceMap = Maps.newHashMap();
-                    Map<String, Alias> normalizedAggExistingAlias = 
Maps.newHashMap();
-                    for (NamedExpression output : 
normalizedAggOutputBuilder.build()) {
-                        if (output instanceof Alias) {
-                            normalizedAggExistingAlias.put(output.getName(), 
(Alias) output);
-                        }
-                    }
+                    // Under only_full_group_by, a non-aggregated output 
expression is allowed only when it
+                    // is constant for every input row (a uniform slot). This 
matches MySQL, which accepts
+                    // such expressions by functional dependency, e.g.
+                    //   SELECT a AS b, b AS c FROM (SELECT 1 AS a, 2 AS b) t 
GROUP BY b, c
+                    // here 'a' is a constant column of the derived table. 
Non-constant missing slots are
+                    // still rejected. The constant ones fall through to the 
any_value() wrapping below
+                    // (any_value of a constant is that constant), so the 
result stays unambiguous.
+                    DataTrait childTrait = 
aggregate.child().getLogicalProperties().getTrait();
+                    List<String> invalidMissingSlots = new ArrayList<>();
                     for (Slot slot : missingSlotsInAggregate) {
-                        AnyValue anyValue = new AnyValue(false, 
normalizedGroupExprs.isEmpty(), slot);
-                        Alias exisitingAlias = 
normalizedAggExistingAlias.get(slot.getName());
-                        if (exisitingAlias != null && 
anyValue.equals(exisitingAlias.child())) {
-                            replaceMap.put(slot, exisitingAlias.toSlot());
-                        } else {
-                            Alias anyValueAlias = new Alias(anyValue, 
slot.getName());
-                            replaceMap.put(slot, anyValueAlias.toSlot());
-                            normalizedAggOutputBuilder.add(anyValueAlias);
+                        if (!childTrait.isUniform(slot)) {
+                            invalidMissingSlots.add("'" + slot.getName() + 
"'");
                         }
                     }
-                    upperProjects = upperProjects.stream()
-                            .map(e -> (NamedExpression) 
ExpressionUtils.replace(e, replaceMap))
-                            .collect(ImmutableList.toImmutableList());
+                    if (!invalidMissingSlots.isEmpty()) {
+                        throw new AnalysisException(String.format("PROJECT 
expression %s must appear in the GROUP BY"
+                                + " clause or be used in an aggregate 
function",
+                                String.join(", ", invalidMissingSlots)));
+                    }
+                }
+                // For any slots missing in aggregate's output, we add an 
any_value(slot) into aggregate's
+                // output list and the slot itself into the bottom project's 
output list. When
+                // only_full_group_by is enabled, all remaining missing slots 
are constant (checked above).
+                bottomProjects = Sets.union(bottomProjects, 
missingSlotsInAggregate);

Review Comment:
   Appending the `any_value` alias to the builder is not enough when 
`eliminateGroupByConstant()` runs. `normalizedAggOutput` was built before 
missing slots are added, and that stale list is passed into 
`eliminateGroupByConstant()`. If the group key is a foldable constant, for 
example `SELECT a, b FROM (SELECT 1 AS a, 2 AS b) t GROUP BY 'g'`, 
`upperProjects` is rewritten to these new `any_value` slots here, but 
constant-key elimination rebuilds the aggregate from the stale 
`normalizedAggOutput` and drops the aliases. The resulting upper project can 
reference slots the aggregate no longer outputs. Please pass the final 
aggregate output, or update constant-key elimination to preserve these newly 
added aliases, and add a test for this combination.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/NormalizeAggregate.java:
##########
@@ -300,37 +300,51 @@ public LogicalPlan normalizeAgg(LogicalAggregate<Plan> 
aggregate, Optional<Logic
             }
             if (!missingSlotsInAggregate.isEmpty()) {
                 if (SqlModeHelper.hasOnlyFullGroupBy()) {
-                    throw new AnalysisException(String.format("PROJECT 
expression %s must appear in the GROUP BY"
-                            + " clause or be used in an aggregate function",
-                            missingSlotsInAggregate.stream()
-                                    .map(slot -> "'" + slot.getName() + "'")
-                                    .collect(Collectors.joining(", "))));
-                } else {
-                    // for any slots missing in aggregate's output, we should 
add a any_value(slot) into
-                    // aggregate's output list and slot itself into bottom 
project's output list
-                    bottomProjects = Sets.union(bottomProjects, 
missingSlotsInAggregate);
-                    Map<Expression, Expression> replaceMap = Maps.newHashMap();
-                    Map<String, Alias> normalizedAggExistingAlias = 
Maps.newHashMap();
-                    for (NamedExpression output : 
normalizedAggOutputBuilder.build()) {
-                        if (output instanceof Alias) {
-                            normalizedAggExistingAlias.put(output.getName(), 
(Alias) output);
-                        }
-                    }
+                    // Under only_full_group_by, a non-aggregated output 
expression is allowed only when it
+                    // is constant for every input row (a uniform slot). This 
matches MySQL, which accepts
+                    // such expressions by functional dependency, e.g.

Review Comment:
   This predicate is too weak for `only_full_group_by`. `DataTrait.isUniform()` 
can be propagated from the nullable side of an outer join via 
`LogicalJoin.computeUniform()` -> `addUniformSlotForOuterJoinNullableSide()`, 
where a slot can still be `NULL` on unmatched rows and non-`NULL` on matched 
rows in the same later group. For example, with nullable `t_v.v`: `SELECT r.v 
FROM (SELECT 1 AS g, 1 AS k UNION ALL SELECT 1 AS g, 2 AS k) l LEFT JOIN 
(SELECT 1 AS k, v FROM t_v LIMIT 1) r ON l.k = r.k GROUP BY l.g`. The right 
`LIMIT 1` marks `r.v` uniform, the left join propagates it, and this branch now 
wraps `r.v` in `any_value()` even though the group can contain both `v` and 
`NULL`. That changes a query that should remain rejected into an ambiguous 
result. Please require a trait that proves one value for every aggregate input 
row, or otherwise account for nullable outer-join extension, before allowing 
the slot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [fix](nereids) allow constant output column not in GROUP BY under only_full_group_by [doris]

Reply via email to