yujun777 commented on code in PR #62742:
URL: https://github.com/apache/doris/pull/62742#discussion_r3361059015


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AddProjectForVolatileExpression.java:
##########
@@ -269,6 +271,85 @@ public <T extends Expression> Optional<Pair<List<T>, 
LogicalProject<Plan>>> rewr
         return Optional.of(Pair.of(newTargetsBuilder.build(), new 
LogicalProject<>(projects, plan.child(0))));
     }
 
+    private Optional<JoinRewriteResult> 
rewriteJoinExpressions(LogicalJoin<Plan, Plan> join,
+            Collection<Expression> targets) {
+        Map<Expression, Integer> volatileExpressionCounter = 
Maps.newLinkedHashMap();
+        Map<Expression, Set<Slot>> volatileExpressionSlots = 
Maps.newLinkedHashMap();
+        for (Expression target : targets) {
+            target.foreach(e -> {
+                Expression expr = (Expression) e;
+                if (expr.isVolatile()) {
+                    volatileExpressionCounter.merge(expr, 1, Integer::sum);
+                    Set<Slot> volatileInputSlots = expr.getInputSlots();
+                    volatileExpressionSlots
+                            .computeIfAbsent(expr, ignored -> 
Sets.newLinkedHashSet())
+                            .addAll(volatileInputSlots.isEmpty() ? 
target.getInputSlots() : volatileInputSlots);
+                }
+            });
+        }
+
+        ImmutableList.Builder<NamedExpression> leftAliases = 
ImmutableList.builder();
+        ImmutableList.Builder<NamedExpression> rightAliases = 
ImmutableList.builder();
+        Map<Expression, Slot> replaceMap = Maps.newHashMap();
+        Set<Slot> leftOutputSet = join.left().getOutputSet();
+        Set<Slot> rightOutputSet = join.right().getOutputSet();
+        for (Entry<Expression, Integer> entry : 
volatileExpressionCounter.entrySet()) {
+            if (entry.getValue() <= 1) {
+                continue;
+            }
+            Set<Slot> inputSlots = volatileExpressionSlots.get(entry.getKey());
+            Set<Slot> volatileInputSlots = entry.getKey().getInputSlots();
+            if (!volatileInputSlots.isEmpty()
+                    && !leftOutputSet.containsAll(inputSlots)
+                    && !rightOutputSet.containsAll(inputSlots)) {
+                continue;
+            }
+            ExprId exprId = StatementScopeIdGenerator.newExprId();
+            String functionName = entry.getKey() instanceof Function
+                    ? ((Function) entry.getKey()).getName() : "volatile";
+            Alias alias = new Alias(exprId, entry.getKey(), "$_" + 
functionName + "_" + exprId.asInt() + "_$");
+            replaceMap.put(alias.child(0), alias.toSlot());
+            // Join can not add a project at join-pair scope, but repeated 
volatile expressions
+            // still need one materialized value. Slot-free volatile functions 
use the containing
+            // conjunct's slots to choose a side, so t2.k + rand() can project 
rand() on the right.
+            // Volatile functions with input slots use their own slots to 
avoid projecting
+            // volatile_udf(t2.k) on the left only because its containing 
conjunct also uses t1.
+            // Volatile functions whose own slots span both join children 
cannot be projected into
+            // either child, so they are not rewritten here.
+            // Put right-only expressions on the right child; otherwise keep 
the previous
+            // left-child behavior as the conservative default.
+            if (!inputSlots.isEmpty() && 
rightOutputSet.containsAll(inputSlots)) {
+                rightAliases.add(alias);
+            } else {

Review Comment:
   This is the same trade-off discussed earlier.
   
   I agree that child-side materialization is not perfectly equivalent to 
join-pair evaluation. But simply skipping this rewrite is not safe either, 
because after BETWEEN expansion a repeated volatile expression is evaluated 
multiple times on the join.
   
   For example, `rand() between 0.6 and 0.1` should never be true if the same 
rand value is used. After expansion, leaving it as `rand() >= 0.6 AND rand() <= 
0.1` evaluates `rand()` twice, so it can incorrectly become true.
   
   Since the current plan model cannot materialize a volatile expression at 
join-pair scope, we have two imperfect choices:
   
   1. leave duplicated volatile expressions on the join and allow internally 
inconsistent predicate evaluation;
   2. materialize once in a child, preserving single-value semantics but 
changing evaluation granularity.
   
   The current implementation chooses the second as the less-bad behavior. A 
fully correct fix would require either preserving volatile BETWEEN before 
expansion or adding join-pair-scope materialization support. So I do not think 
simply skipping repeated volatile ON expressions is a correctness improvement; 
it just switches back to the duplicated-evaluation bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to