Re: [PR] fix(9678): short circuiting prevented population of visited stack, for common subexpr elimination optimization [arrow-datafusion]

via GitHub Wed, 20 Mar 2024 00:19:43 -0700


wiedld commented on code in PR #9685:
URL: https://github.com/apache/arrow-datafusion/pull/9685#discussion_r1531581836



##########
datafusion/optimizer/src/common_subexpr_eliminate.rs:
##########
@@ -42,6 +41,15 @@ use datafusion_expr::{col, Expr, ExprSchemable};
 /// - DataType of this expression.
 type ExprSet = HashMap<Identifier, (Expr, usize, DataType)>;
 
+/// An ordered map of Identifiers encountered during visitation.
+///
+/// Is created in the ExprIdentifierVisitor, which identifies the common 
expressions.
+/// Is consumed in the CommonSubexprRewriter, which performs mutations.
+///
+/// - series_number. increased in fn_up, start from 1.
+/// - Identifier. is empty ("") if expr should not be considered for common 
elimation.
+type IdArray = Vec<(usize, Identifier)>;

Review Comment:
   This abstraction really helps to understand how the 2 visitors work 
together. 
   
   The first visitor builds the `IdArray` and `ExprSet`. The first visitor sets 
the vector idx used on f_down, whereas the series_number is set on f_up. An 
empty string for the expr_id means that this expression is not considered for 
rewrite (is skipped) in the second visitor.
   e.g. `[(3, expr_id),(2, expr_id),(1, expr_id)]`
   
   The second visitor then performs the mutation, based upon the `IdArray` and 
`ExprSet`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(9678): short circuiting prevented population of visited stack, for common subexpr elimination optimization [arrow-datafusion]

Reply via email to