alamb commented on code in PR #9871:
URL: https://github.com/apache/arrow-datafusion/pull/9871#discussion_r1545333347


##########
datafusion/optimizer/src/common_subexpr_eliminate.rs:
##########
@@ -35,37 +36,75 @@ use datafusion_expr::expr::Alias;
 use datafusion_expr::logical_plan::{Aggregate, LogicalPlan, Projection, 
Window};
 use datafusion_expr::{col, Expr, ExprSchemable};
 
-/// A map from expression's identifier to tuple including
-/// - the expression itself (cloned)
-/// - counter
-/// - DataType of this expression.
-type ExprSet = HashMap<Identifier, (Expr, usize, DataType)>;
+/// Set of expressions generated by the [`ExprIdentifierVisitor`]
+/// and consumed by the [`CommonSubexprRewriter`].
+#[derive(Default)]
+struct ExprSet {
+    /// A map from expression's identifier (stringified expr) to tuple 
including:
+    /// - the expression itself (cloned)
+    /// - counter
+    /// - DataType of this expression.
+    /// - symbol used as the identifier in the alias.
+    map: HashMap<Identifier, (Expr, usize, DataType, Identifier)>,

Review Comment:
   The amount of copying this requires is unfortunate (maybe it is N^2 in the 
number of exprs?) -- however, I don't think your changes make it any worse (or 
better). 
   
   I filed https://github.com/apache/arrow-datafusion/issues/9873 to track the 
issue. Maybe now that you know this code much better, a fun side project would 
be to make it much faster 🚀  (🎣 )
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to