github-actions[bot] commented on code in PR #64293:
URL: https://github.com/apache/doris/pull/64293#discussion_r3381190891


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/EliminateEmptyRelation.java:
##########
@@ -230,6 +241,48 @@ private boolean bothChildrenEmpty(LogicalJoin<?, ?> join) {
         return join.left() instanceof EmptyRelation && join.right() instanceof 
EmptyRelation;
     }
 
+    private Plan 
rewriteCountLiteralOnEmptyAggregate(LogicalProject<LogicalAggregate<LogicalEmptyRelation>>
 project) {
+        LogicalAggregate<LogicalEmptyRelation> aggregate = project.child();
+        if (!aggregate.getGroupByExpressions().isEmpty()) {
+            return null;
+        }
+
+        Set<ExprId> countStarSlots = new HashSet<>();
+        for (NamedExpression output : aggregate.getOutputExpressions()) {
+            if (output.anyMatch(expr -> expr instanceof Count && ((Count) 
expr).isCountStar())) {
+                countStarSlots.add(output.toSlot().getExprId());
+            }
+        }
+        if (countStarSlots.isEmpty()) {
+            return null;
+        }
+
+        boolean changed = false;
+        List<NamedExpression> newProjects = new 
ArrayList<>(project.getProjects().size());

Review Comment:
   This pattern is not a safe proxy for “this `If` was generated by 
`CountLiteralRewrite`”. It is too broad: a user-written query such as `SELECT 
if(json_extract('{"id":123}', '$.') IS NULL, 0, count(*)) FROM empty_table` 
should still evaluate the scalar `json_extract` in the SELECT expression and 
raise the invalid-path error, but this rule replaces that whole alias with `0`. 
It is also too narrow for generated expressions: `SELECT 
count(json_extract('{"id":123}', '$.')) + 1 FROM empty_table` becomes an alias 
whose child is `Add(If(...), 1)`, so the generated invalid expression is not 
removed and the original empty-input bug remains. Please track or rewrite only 
the generated `count(expr)` replacement recursively without changing equivalent 
user-authored `IF` expressions, and add regression coverage for both examples.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/EliminateEmptyRelation.java:
##########
@@ -73,6 +81,9 @@ public List<Rule> buildRules() {
                     
ConnectContext.get().getStatementContext().getNextRelationId(),
                     agg.getOutput())
                 ).toRule(RuleType.ELIMINATE_AGG_ON_EMPTYRELATION),

Review Comment:
   This compensation only runs after the child has already become 
`LogicalEmptyRelation`, so it does not preserve the original aggregate 
semantics for zero rows that are discovered only at execution time. For 
example, with a non-empty table and a predicate that happens to match no rows, 
`SELECT count(json_extract('{"id":123}', '$.')) FROM t WHERE id = 2` should 
return `0` without evaluating the count argument, but this rewrite still leaves 
`json_extract(..., '$.')` in the upper project over `count(*)`; the global 
aggregate emits one row with `count(*) = 0`, then the project evaluates the 
invalid JSON path and errors. External scans and filters that cannot be 
statically pruned have the same shape. Please do not rely only on 
`EliminateEmptyRelation` for correctness here; either make the runtime plan 
avoid evaluating the argument when the aggregate count is zero, or restrict 
this rewrite to expressions that are safe to evaluate even when the input is 
empty.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/CountLiteralRewrite.java:
##########
@@ -107,10 +114,29 @@ private boolean isCountLiteral(AggregateFunction aggFunc) 
{
                 && aggFunc.child(0).isLiteral();
     }
 
+    private boolean isCountConstantExpression(AggregateFunction aggFunc) {
+        if (aggFunc.isDistinct()
+                || !(aggFunc instanceof Count)
+                || aggFunc.children().size() != 1) {
+            return false;
+        }

Review Comment:
   The `foldable`/deterministic/slot-free checks are not enough to prove that 
evaluating the expression once above the aggregate is safe. `sleep(1)` has no 
input slots, uses the default deterministic/foldable behavior, and is not 
volatile, and this tree already has a special 
`FoldConstantRuleOnBE.shouldSkipFold` case for `Sleep` because evaluating it 
during folding can cause RPC timeouts. With this predicate, `count(sleep(1))` 
is rewritten to `if(sleep(1) is null, 0, count(*))`, changing execution from 
one sleep per input row to one sleep per output group. That is user-visible 
behavior, not just a faster equivalent count. Please exclude 
side-effecting/runtime-sensitive functions such as `Sleep` (ideally by sharing 
the existing constant-fold safety exclusions or a stricter “safe to 
pre-evaluate” predicate) and add a negative test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to