github-actions[bot] commented on code in PR #64293:
URL: https://github.com/apache/doris/pull/64293#discussion_r3381190891
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/EliminateEmptyRelation.java:
##########
@@ -230,6 +241,48 @@ private boolean bothChildrenEmpty(LogicalJoin<?, ?> join) {
return join.left() instanceof EmptyRelation && join.right() instanceof
EmptyRelation;
}
+ private Plan
rewriteCountLiteralOnEmptyAggregate(LogicalProject<LogicalAggregate<LogicalEmptyRelation>>
project) {
+ LogicalAggregate<LogicalEmptyRelation> aggregate = project.child();
+ if (!aggregate.getGroupByExpressions().isEmpty()) {
+ return null;
+ }
+
+ Set<ExprId> countStarSlots = new HashSet<>();
+ for (NamedExpression output : aggregate.getOutputExpressions()) {
+ if (output.anyMatch(expr -> expr instanceof Count && ((Count)
expr).isCountStar())) {
+ countStarSlots.add(output.toSlot().getExprId());
+ }
+ }
+ if (countStarSlots.isEmpty()) {
+ return null;
+ }
+
+ boolean changed = false;
+ List<NamedExpression> newProjects = new
ArrayList<>(project.getProjects().size());
Review Comment:
This pattern is not a safe proxy for “this `If` was generated by
`CountLiteralRewrite`”. It is too broad: a user-written query such as `SELECT
if(json_extract('{"id":123}', '$.') IS NULL, 0, count(*)) FROM empty_table`
should still evaluate the scalar `json_extract` in the SELECT expression and
raise the invalid-path error, but this rule replaces that whole alias with `0`.
It is also too narrow for generated expressions: `SELECT
count(json_extract('{"id":123}', '$.')) + 1 FROM empty_table` becomes an alias
whose child is `Add(If(...), 1)`, so the generated invalid expression is not
removed and the original empty-input bug remains. Please track or rewrite only
the generated `count(expr)` replacement recursively without changing equivalent
user-authored `IF` expressions, and add regression coverage for both examples.
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/EliminateEmptyRelation.java:
##########
@@ -73,6 +81,9 @@ public List<Rule> buildRules() {
ConnectContext.get().getStatementContext().getNextRelationId(),
agg.getOutput())
).toRule(RuleType.ELIMINATE_AGG_ON_EMPTYRELATION),
Review Comment:
This compensation only runs after the child has already become
`LogicalEmptyRelation`, so it does not preserve the original aggregate
semantics for zero rows that are discovered only at execution time. For
example, with a non-empty table and a predicate that happens to match no rows,
`SELECT count(json_extract('{"id":123}', '$.')) FROM t WHERE id = 2` should
return `0` without evaluating the count argument, but this rewrite still leaves
`json_extract(..., '$.')` in the upper project over `count(*)`; the global
aggregate emits one row with `count(*) = 0`, then the project evaluates the
invalid JSON path and errors. External scans and filters that cannot be
statically pruned have the same shape. Please do not rely only on
`EliminateEmptyRelation` for correctness here; either make the runtime plan
avoid evaluating the argument when the aggregate count is zero, or restrict
this rewrite to expressions that are safe to evaluate even when the input is
empty.
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/CountLiteralRewrite.java:
##########
@@ -107,10 +114,29 @@ private boolean isCountLiteral(AggregateFunction aggFunc)
{
&& aggFunc.child(0).isLiteral();
}
+ private boolean isCountConstantExpression(AggregateFunction aggFunc) {
+ if (aggFunc.isDistinct()
+ || !(aggFunc instanceof Count)
+ || aggFunc.children().size() != 1) {
+ return false;
+ }
Review Comment:
The `foldable`/deterministic/slot-free checks are not enough to prove that
evaluating the expression once above the aggregate is safe. `sleep(1)` has no
input slots, uses the default deterministic/foldable behavior, and is not
volatile, and this tree already has a special
`FoldConstantRuleOnBE.shouldSkipFold` case for `Sleep` because evaluating it
during folding can cause RPC timeouts. With this predicate, `count(sleep(1))`
is rewritten to `if(sleep(1) is null, 0, count(*))`, changing execution from
one sleep per input row to one sleep per output group. That is user-visible
behavior, not just a faster equivalent count. Please exclude
side-effecting/runtime-sensitive functions such as `Sleep` (ideally by sharing
the existing constant-fold safety exclusions or a stricter “safe to
pre-evaluate” predicate) and add a negative test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]