github-actions[bot] commented on code in PR #64458: URL: https://github.com/apache/doris/pull/64458#discussion_r3402101106
########## regression-test/suites/nereids_syntax_p0/test_group_by_constant_output.groovy: ########## @@ -0,0 +1,51 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +suite("test_group_by_constant_output") { + sql "SET enable_nereids_planner=true" + sql "SET enable_fallback_to_original_planner=false" + + // The output alias 'b' collides with the derived-table column 'b', so under only_full_group_by the + // column 'b' is grouped and the constant column 'a' is left ungrouped. Because 'a' is constant for + // every input row it is accepted (MySQL functional-dependency behavior), and the query returns (1, 2). + def r1 = sql "SELECT a as b, b as c FROM (SELECT 1 as a, 2 as b) t1 GROUP BY b, c" + assertEquals(1, r1.size()) + assertEquals(1, (r1[0][0] as int)) + assertEquals(2, (r1[0][1] as int)) + + // A constant column not present in GROUP BY is allowed even though it is neither grouped nor aggregated. + def r2 = sql "SELECT a, b FROM (SELECT 1 as a, 2 as b) t1 GROUP BY a" + assertEquals(1, r2.size()) + assertEquals(1, (r2[0][0] as int)) + assertEquals(2, (r2[0][1] as int)) + + // The same shape over a real table leaves a non-constant column ungrouped, which is still rejected + // (matching MySQL, where only constant / functionally-dependent columns are allowed). + sql "DROP TABLE IF EXISTS test_gb_const_t" + sql """ + CREATE TABLE test_gb_const_t (a INT, b INT) ENGINE=OLAP + DUPLICATE KEY(a) + DISTRIBUTED BY HASH(a) BUCKETS 1 + PROPERTIES ('replication_num' = '1') + """ + test { + sql "SELECT a as b, b as c FROM test_gb_const_t GROUP BY b, c" + exception "must appear in the GROUP BY" + } + + sql "DROP TABLE IF EXISTS test_gb_const_t" Review Comment: Please do not drop the regression table at the end. The repository test rules require dropping before use and preserving the final table state for debugging after a failure. The `DROP TABLE IF EXISTS` before create is enough; remove this final drop. ########## regression-test/suites/nereids_syntax_p0/test_group_by_constant_output.groovy: ########## @@ -0,0 +1,51 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +suite("test_group_by_constant_output") { + sql "SET enable_nereids_planner=true" + sql "SET enable_fallback_to_original_planner=false" + + // The output alias 'b' collides with the derived-table column 'b', so under only_full_group_by the + // column 'b' is grouped and the constant column 'a' is left ungrouped. Because 'a' is constant for + // every input row it is accepted (MySQL functional-dependency behavior), and the query returns (1, 2). + def r1 = sql "SELECT a as b, b as c FROM (SELECT 1 as a, 2 as b) t1 GROUP BY b, c" + assertEquals(1, r1.size()) Review Comment: The added regression test is checking deterministic result rows with `assertEquals`. The repository regression-test rules require determined expected results to be generated via `qt_sql`/`order_qt_*` and the generated `.out` file, rather than manual assertions. Please convert these positive cases to query tests with deterministic ordering as needed. ########## fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/NormalizeAggregate.java: ########## @@ -300,37 +300,51 @@ public LogicalPlan normalizeAgg(LogicalAggregate<Plan> aggregate, Optional<Logic } if (!missingSlotsInAggregate.isEmpty()) { if (SqlModeHelper.hasOnlyFullGroupBy()) { - throw new AnalysisException(String.format("PROJECT expression %s must appear in the GROUP BY" - + " clause or be used in an aggregate function", - missingSlotsInAggregate.stream() - .map(slot -> "'" + slot.getName() + "'") - .collect(Collectors.joining(", ")))); - } else { - // for any slots missing in aggregate's output, we should add a any_value(slot) into - // aggregate's output list and slot itself into bottom project's output list - bottomProjects = Sets.union(bottomProjects, missingSlotsInAggregate); - Map<Expression, Expression> replaceMap = Maps.newHashMap(); - Map<String, Alias> normalizedAggExistingAlias = Maps.newHashMap(); - for (NamedExpression output : normalizedAggOutputBuilder.build()) { - if (output instanceof Alias) { - normalizedAggExistingAlias.put(output.getName(), (Alias) output); - } - } + // Under only_full_group_by, a non-aggregated output expression is allowed only when it + // is constant for every input row (a uniform slot). This matches MySQL, which accepts + // such expressions by functional dependency, e.g. + // SELECT a AS b, b AS c FROM (SELECT 1 AS a, 2 AS b) t GROUP BY b, c + // here 'a' is a constant column of the derived table. Non-constant missing slots are + // still rejected. The constant ones fall through to the any_value() wrapping below + // (any_value of a constant is that constant), so the result stays unambiguous. + DataTrait childTrait = aggregate.child().getLogicalProperties().getTrait(); + List<String> invalidMissingSlots = new ArrayList<>(); for (Slot slot : missingSlotsInAggregate) { - AnyValue anyValue = new AnyValue(false, normalizedGroupExprs.isEmpty(), slot); - Alias exisitingAlias = normalizedAggExistingAlias.get(slot.getName()); - if (exisitingAlias != null && anyValue.equals(exisitingAlias.child())) { - replaceMap.put(slot, exisitingAlias.toSlot()); - } else { - Alias anyValueAlias = new Alias(anyValue, slot.getName()); - replaceMap.put(slot, anyValueAlias.toSlot()); - normalizedAggOutputBuilder.add(anyValueAlias); + if (!childTrait.isUniform(slot)) { + invalidMissingSlots.add("'" + slot.getName() + "'"); } } - upperProjects = upperProjects.stream() - .map(e -> (NamedExpression) ExpressionUtils.replace(e, replaceMap)) - .collect(ImmutableList.toImmutableList()); + if (!invalidMissingSlots.isEmpty()) { + throw new AnalysisException(String.format("PROJECT expression %s must appear in the GROUP BY" + + " clause or be used in an aggregate function", + String.join(", ", invalidMissingSlots))); + } + } + // For any slots missing in aggregate's output, we add an any_value(slot) into aggregate's + // output list and the slot itself into the bottom project's output list. When + // only_full_group_by is enabled, all remaining missing slots are constant (checked above). + bottomProjects = Sets.union(bottomProjects, missingSlotsInAggregate); Review Comment: Appending the `any_value` alias to the builder is not enough when `eliminateGroupByConstant()` runs. `normalizedAggOutput` was built before missing slots are added, and that stale list is passed into `eliminateGroupByConstant()`. If the group key is a foldable constant, for example `SELECT a, b FROM (SELECT 1 AS a, 2 AS b) t GROUP BY 'g'`, `upperProjects` is rewritten to these new `any_value` slots here, but constant-key elimination rebuilds the aggregate from the stale `normalizedAggOutput` and drops the aliases. The resulting upper project can reference slots the aggregate no longer outputs. Please pass the final aggregate output, or update constant-key elimination to preserve these newly added aliases, and add a test for this combination. ########## fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/NormalizeAggregate.java: ########## @@ -300,37 +300,51 @@ public LogicalPlan normalizeAgg(LogicalAggregate<Plan> aggregate, Optional<Logic } if (!missingSlotsInAggregate.isEmpty()) { if (SqlModeHelper.hasOnlyFullGroupBy()) { - throw new AnalysisException(String.format("PROJECT expression %s must appear in the GROUP BY" - + " clause or be used in an aggregate function", - missingSlotsInAggregate.stream() - .map(slot -> "'" + slot.getName() + "'") - .collect(Collectors.joining(", ")))); - } else { - // for any slots missing in aggregate's output, we should add a any_value(slot) into - // aggregate's output list and slot itself into bottom project's output list - bottomProjects = Sets.union(bottomProjects, missingSlotsInAggregate); - Map<Expression, Expression> replaceMap = Maps.newHashMap(); - Map<String, Alias> normalizedAggExistingAlias = Maps.newHashMap(); - for (NamedExpression output : normalizedAggOutputBuilder.build()) { - if (output instanceof Alias) { - normalizedAggExistingAlias.put(output.getName(), (Alias) output); - } - } + // Under only_full_group_by, a non-aggregated output expression is allowed only when it + // is constant for every input row (a uniform slot). This matches MySQL, which accepts + // such expressions by functional dependency, e.g. Review Comment: This predicate is too weak for `only_full_group_by`. `DataTrait.isUniform()` can be propagated from the nullable side of an outer join via `LogicalJoin.computeUniform()` -> `addUniformSlotForOuterJoinNullableSide()`, where a slot can still be `NULL` on unmatched rows and non-`NULL` on matched rows in the same later group. For example, with nullable `t_v.v`: `SELECT r.v FROM (SELECT 1 AS g, 1 AS k UNION ALL SELECT 1 AS g, 2 AS k) l LEFT JOIN (SELECT 1 AS k, v FROM t_v LIMIT 1) r ON l.k = r.k GROUP BY l.g`. The right `LIMIT 1` marks `r.v` uniform, the left join propagates it, and this branch now wraps `r.v` in `any_value()` even though the group can contain both `v` and `NULL`. That changes a query that should remain rejected into an ambiguous result. Please require a trait that proves one value for every aggregate input row, or otherwise account for nullable outer-join extension, before allowing the slot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
