waynexia commented on code in PR #9700: URL: https://github.com/apache/arrow-datafusion/pull/9700#discussion_r1533433201
########## datafusion/optimizer/src/common_subexpr_eliminate.rs: ########## @@ -53,10 +53,32 @@ type ExprSet = HashMap<Identifier, (Expr, usize, DataType)>; /// here is not such a good choose. type Identifier = String; -/// Perform Common Sub-expression Elimination optimization. +/// Performs Common Sub-expression Elimination optimization. /// -/// Currently only common sub-expressions within one logical plan will +/// This optimization improves query performance by computing expressions that +/// appear more than once and reusing those results rather than re-computing the +/// same value +/// +/// Currently only common sub-expressions within a single `LogicalPlan` are /// be eliminated. +/// +/// # Example +/// +/// Given a projection that computes the same expensive expression +/// multiple times such as parsing as string as a date with `to_date` twice: +/// +/// ```text +/// ProjectionExec(expr=[extract (day from to_date(c1)), extract (year from to_date(c1))]) +/// ``` +/// +/// This optimization will rewrite the plan to compute the common expression once +/// using a new `ProjectionExec` and then rewrite the original expressions to +/// refer to that new column. +/// +/// ```text +/// ProjectionExec(exprs=[extract (day from new_col), extract (year from new_col)]) <-- reuse here Review Comment: It is possible. The optimized plan will have an obvious intermediate `Projection` plan that does some computation and the result will be referred later by other following plans (this pattern might occur in other reasons though -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
