adriangb commented on code in PR #19538:
URL: https://github.com/apache/datafusion/pull/19538#discussion_r2721546456


##########
datafusion/physical-expr/src/projection.rs:
##########
@@ -806,17 +807,65 @@ pub fn update_expr(
         RewrittenInvalid,
     }
 
+    // Track Arc pointers of columns created by pass 1.
+    // These should not be modified by pass 2.
+    // We use Arc pointer addresses (not name/index) to distinguish 
pass-1-created columns
+    // from original columns that happen to have the same name and index.
+    let mut pass1_created: HashSet<usize> = HashSet::new();
+
+    // First pass: try to rewrite the expression in terms of the projected 
expressions.
+    // For example, if the expression is `a + b > 5` and the projection is `a 
+ b AS sum_ab`,
+    // we can rewrite the expression to `sum_ab > 5` directly.
+    //
+    // This optimization only applies when sync_with_child=false, meaning we 
want the
+    // expression to use OUTPUT references (e.g., when pushing projection down 
and the
+    // expression will be above the projection). Pass 1 creates OUTPUT column 
references.
+    //
+    // When sync_with_child=true, we want INPUT references (expanding OUTPUT 
to INPUT),

Review Comment:
   Yeah this method has always... confused me. At the very least the boolean 
argument is not great (I understand there's a lot of shared code between the 
branches and that's why it is this way).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to