alamb commented on code in PR #9685: URL: https://github.com/apache/arrow-datafusion/pull/9685#discussion_r1532048878
########## datafusion/sqllogictest/test_files/common_subexpr_eliminate.slt: ########## @@ -0,0 +1,106 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at + +# http://www.apache.org/licenses/LICENSE-2.0 + +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +############# +## Common Subexpr Eliminate Tests +############# + +statement ok Review Comment: I recommend we move this test out of its own .slt file and into https://github.com/apache/arrow-datafusion/blob/main/datafusion/sqllogictest/test_files/expr.slt so it is eaiser to find ########## datafusion/optimizer/src/common_subexpr_eliminate.rs: ########## @@ -532,18 +545,18 @@ impl ExprMask { /// Go through an expression tree and generate identifier. /// /// An identifier contains information of the expression itself and its sub-expression. -/// This visitor implementation use a stack `visit_stack` to track traversal, which +/// This visitor implementation use a stack `f_down()` to track traversal, which Review Comment: I think the original was correct (the stack was called `visit_stack`) -- it seems like the stack itself is created on `f_down` ########## datafusion/optimizer/src/common_subexpr_eliminate.rs: ########## @@ -693,20 +713,26 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { if expr.short_circuits() || is_volatile_expression(&expr)? { return Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump)); } + + let (series_number, curr_id) = &self.id_array[self.curr_index]; Review Comment: ❤️ ########## datafusion/optimizer/src/common_subexpr_eliminate.rs: ########## @@ -571,66 +584,73 @@ enum VisitRecord { /// `usize` is the monotone increasing series number assigned in pre_visit(). /// Starts from 0. Is used to index the identifier array `id_array` in post_visit(). EnterMark(usize), + /// the node's children were skipped => jump to f_up on same node Review Comment: This is the key fix, as I understand it -- the TreeNodeVisitor rewrite removed the notion of skipping sibling nodes during recursion, so this notion must be explicitly encoded in the subexpression rewrite pass. ########## datafusion/sqllogictest/test_files/common_subexpr_eliminate.slt: ########## @@ -0,0 +1,106 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at + +# http://www.apache.org/licenses/LICENSE-2.0 + +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +############# +## Common Subexpr Eliminate Tests +############# + +statement ok +CREATE TABLE doubles ( + f64 DOUBLE +) as VALUES + (10.1) +; + +# common subexpr with alias +query RRR rowsort +select f64, round(1.0 / f64) as i64_1, acos(round(1.0 / f64)) from doubles; +---- +10.1 0 1.570796326795 + +# common subexpr with coalesce (short-circuited) +query RRR rowsort +select f64, coalesce(1.0 / f64, 0.0), acos(coalesce(1.0 / f64, 0.0)) from doubles; +---- +10.1 0.09900990099 1.471623942989 + +# common subexpr with coalesce (short-circuited) and alias +query RRR rowsort +select f64, coalesce(1.0 / f64, 0.0) as f64_1, acos(coalesce(1.0 / f64, 0.0)) from doubles; +---- +10.1 0.09900990099 1.471623942989 + +# common subexpr with case (short-circuited) +query RRR rowsort +select f64, case when f64 > 0 then 1.0 / f64 else null end, acos(case when f64 > 0 then 1.0 / f64 else null end) from doubles; +---- +10.1 0.09900990099 1.471623942989 + + +statement ok Review Comment: I believe this test is a duplicate of something elsewhere. While it was valuable for debugging I think we should remove the duplication to make the code easier to maintain over the long run ########## datafusion/optimizer/src/common_subexpr_eliminate.rs: ########## @@ -42,6 +41,15 @@ use datafusion_expr::{col, Expr, ExprSchemable}; /// - DataType of this expression. type ExprSet = HashMap<Identifier, (Expr, usize, DataType)>; +/// An ordered map of Identifiers encountered during visitation. +/// +/// Is created in the ExprIdentifierVisitor, which identifies the common expressions. +/// Is consumed in the CommonSubexprRewriter, which performs mutations. +/// +/// - series_number. increased in fn_up, start from 1. +/// - Identifier. is empty ("") if expr should not be considered for common elimation. +type IdArray = Vec<(usize, Identifier)>; Review Comment: THank you -- I agree introducing a new typedef helps to make the code eaiser to understand -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
