peter-toth opened a new pull request, #13618: URL: https://github.com/apache/datafusion/pull/13618
## Which issue does this PR close? This a proof of concept PR to improve performance of `TreeNode` traversals. The main purpose of this PR is to demonstrate a promising but API breaking optimization. ## Rationale for this change `TreeNode` traversal APIs are crucial parts of query plan analysis and optimzation. This PR explores the idea of storing some pre-calculatd statistics/properties nodes (or subtrees) inside the nodes during creation and automatically update the stats during transfomations. This PR focuses on logical plan building blocks (`LogicalPlan`, `Expr`) and only one particular optimization that stores a bitset pattern in each node to describe the subtree content, but additional attributes/properties can be added in follow-up PRs like: - The type of expressions to speed up `Expr::get_type()` (https://github.com/apache/datafusion/issues/9375#issuecomment-2477605188, https://github.com/apache/datafusion/issues/12604) - Hash / semantic hash of expressions to improve CSE (https://github.com/apache/datafusion/pull/13315#discussion_r1843848953) and the idea can be extended to physical trees as well. ## What changes are included in this PR? To store the pre-calculated statistics/properties `LogicalPlanStats` struct fields are added to each `LogicalPlan` and `Expr` nodes: ```rust pub enum LogicalPlan { Projection(Projection, LogicalPlanStats), Filter(Filter, LogicalPlanStats), ... } pub enum Expr { BinaryExpr(BinaryExpr, LogicalPlanStats), Like(Like, LogicalPlanStats), ... } ``` This might look redundant but most likely this is the least intrusive way to the existing code. Pattern matching against `LogicalPlan` and `Expr` just need to add a new param, while new enum constructor methods can be defined as lowercase version of the enum items (if there is no conflict with existing methods). The constructor methods calculate the `LogicalPlanStats` fields based on main content of the node. Please note that even this approach requites quite a lot of API breaking changes in the codebase, but all are mechanical. This table summarizes the before/after code to construct the enum items: | Before | After | | --- | --- | | `Expr::Alias` | `Expr::alias_qualified` | | `Expr::Column` | `Expr::column` | | `Expr::ScalarVariable` | `Expr::scalar_variable` | | `Expr::Literal` | `Expr::literal` | | `Expr::BinaryExpr` | `Expr::binary_expr` | | `Expr::Like` | `Expr::_like` | | `Expr::SimilarTo` | `Expr::similar_to` | | `Expr::Not` | `Expr::_not` | | `Expr::IsNotNull` | `Expr::_is_not_null` | | `Expr::IsNull` | `Expr::_is_null` | | `Expr::IsTrue` | `Expr::_is_true` | | `Expr::IsFalse` | `Expr::_is_false` | | `Expr::IsUnknown` | `Expr::_is_unknown` | | `Expr::IsNotTrue` | `Expr::_is_not_true` | | `Expr::IsNotFalse` | `Expr::_is_not_false` | | `Expr::IsNotUnknown` | `Expr::_is_not_unknown` | | `Expr::Negative` | `Expr::negative` | | `Expr::Between` | `Expr::_between` | | `Expr::Case` | `Expr::case` | | `Expr::Cast` | `Expr::cast` | | `Expr::TryCast` | `Expr:: try_cast| | `Expr::ScalarFunction` | `Expr::scalar_function` | | `Expr::AggregateFunction` | `Expr::aggregate_function` | | `Expr::WindowFunction` | `Expr::window_function` | | `Expr::InList` | `Expr::_in_list` | | `Expr::Exists` | `Expr::exists` | | `Expr::InSubquery` | `Expr::in_subquery` | | `Expr::ScalarSubquery` | `Expr::scalar_subquery` | | `Expr::Wildcard` | `Expr::wildcard` | | `Expr::GroupingSet` | `Expr::grouping_set` | | `Expr::Placeholder` | `Expr::placeholder` | | `Expr::OuterReferenceColumn` | `Expr::outer_reference_column` | | `Expr::Unnest` | `Expr::unnest` | | `LogicalPlan::Projection` | `LogicalPlan::projection` | | `LogicalPlan::Filter` | `LogicalPlan::filter` | | `LogicalPlan::Window` | `LogicalPlan::window` | | `LogicalPlan::Aggregate` | `LogicalPlan::aggregate` | | `LogicalPlan::Sort` | `LogicalPlan::sort` | | `LogicalPlan::Join` | `LogicalPlan::join` | | `LogicalPlan::Repartition` | `LogicalPlan::repartition` | | `LogicalPlan::Union` | `LogicalPlan::union` | | `LogicalPlan::TableScan` | `LogicalPlan::table_ccan` | | `LogicalPlan::EmptyRelation` | `LogicalPlan::empty_relation` | | `LogicalPlan::Subquery` | `LogicalPlan::subquery` | | `LogicalPlan::SubqueryAlias` | `LogicalPlan::subquery_alias` | | `LogicalPlan::Limit` | `LogicalPlan::limit` | | `LogicalPlan::Statement` | `LogicalPlan::statement` | | `LogicalPlan::Values` | `LogicalPlan::values` | | `LogicalPlan::Explain` | `LogicalPlan::explain` | | `LogicalPlan::Analyze` | `LogicalPlan::analyze` | | `LogicalPlan::Extension` | `LogicalPlan::extension` | | `LogicalPlan::Distinct` | `LogicalPlan::distinct` | | `LogicalPlan::Dml` | `LogicalPlan::dml` | | `LogicalPlan::Ddl` | `LogicalPlan::ddl` | | `LogicalPlan::Copy` | `LogicalPlan::copy` | | `LogicalPlan::DescribeTable` | `LogicalPlan::describe_table` | | `LogicalPlan::Unnest` | `LogicalPlan::unnest` | | `LogicalPlan::RecursiveQuery` | `LogicalPlan::recursive_query` | For the above mentioned pattern based optimization the PR defines a `LogicalPlanPattern` enum that contains all possible node kinds of a logical plan: ```rust pub enum LogicalPlanPattern { // [`Expr`] nodes ExprBinaryExpr, ExprLike, ... // [`LogicalPlan`] nodes LogicalPlanProjection, LogicalPlanFilter, ... } ``` A bitset of `LogicalPlanPattern` enum is added to the `LogicalPlanStats` struct to reflect the content of the node's subtree. The implementation could use any kind of bitset, but https://docs.rs/enumset/latest/enumset/ looks like a good candidate. ```rust pub struct LogicalPlanStats { patterns: EnumSet<LogicalPlanPattern>, } ``` For example here are a few `Expr` item constructors: ```rust impl Enum { pub fn binary_expr(binary_expr: BinaryExpr) -> Self { // A `LikeBinaryExpr` node contains the `ExprBinaryExpr` pattern and the patterns of its children let stats = LogicalPlanStats::new(enum_set!(LogicalPlanPattern::ExprBinaryExpr)).merge(binary_expr.stats()); Expr::BinaryExpr(binary_expr, stats) } pub fn _like(like: Like) -> Self { // A `Like` node contains the `ExprLike` pattern and the patterns of its children let stats = LogicalPlanStats::new(enum_set!(LogicalPlanPattern::ExprLike)).merge(like.stats()); Expr::Like(like, stats) } ... } ``` While maintaining the bitset during tree transformations comes with some costs, the pre-calculated bitset we can speed up `LogicalPlan` and `Expr` traversals significantly. For example if we have a traversal that does something with `Expr::BinaryExpr` nodes only: ```rust expr.apply(|e| { match e { Expr::BinaryExpr(..) => // do something _ => // do nothing } }) ``` then we can check the presence of `Expr::BinaryExpr` in a subtree during the traversal and simply skip traversing subtrees without the `LogicalPlanPattern::ExprBinaryExpr` pattern: ```rust expr.apply(|e| { if !e.stats().contains_pattern(enum_set!(LogicalPlanPattern::ExprBinaryExpr)) { return Ok(TreeNodeRecursion::Jump); } match e { Expr::BinaryExpr(..) => // do something _ => // do nothing } }) ``` I modified some of the traversal functions in this PR to demonstrate that the optimization brings significant performance improvement to `sql_planner`: ``` % critcmp main stats group main stats ----- ---- ----- logical_aggregate_with_join 1.15 649.7±183.81µs ? ?/sec 1.00 564.2±6.04µs ? ?/sec logical_select_all_from_1000 1.00 2.8±0.02ms ? ?/sec 1.00 2.8±0.03ms ? ?/sec logical_select_one_from_700 1.00 408.8±43.00µs ? ?/sec 1.00 408.9±45.43µs ? ?/sec logical_trivial_join_high_numbered_columns 1.00 396.2±6.83µs ? ?/sec 1.00 396.0±5.13µs ? ?/sec logical_trivial_join_low_numbered_columns 1.00 381.7±19.18µs ? ?/sec 1.00 383.5±30.79µs ? ?/sec physical_intersection 1.53 1034.7±13.33µs ? ?/sec 1.00 677.8±53.69µs ? ?/sec physical_join_consider_sort 1.49 1459.6±13.75µs ? ?/sec 1.00 979.1±75.49µs ? ?/sec physical_join_distinct 1.07 378.0±26.75µs ? ?/sec 1.00 353.8±12.57µs ? ?/sec physical_many_self_joins 1.38 10.4±0.61ms ? ?/sec 1.00 7.5±0.48ms ? ?/sec physical_plan_clickbench_all 1.15 89.8±0.71ms ? ?/sec 1.00 78.0±0.69ms ? ?/sec physical_plan_clickbench_q1 1.15 1301.7±210.59µs ? ?/sec 1.00 1136.1±14.32µs ? ?/sec physical_plan_clickbench_q10 1.15 1680.5±20.50µs ? ?/sec 1.00 1464.7±121.45µs ? ?/sec physical_plan_clickbench_q11 1.22 1785.3±195.51µs ? ?/sec 1.00 1458.5±19.58µs ? ?/sec physical_plan_clickbench_q12 1.12 1823.1±151.47µs ? ?/sec 1.00 1634.2±226.69µs ? ?/sec physical_plan_clickbench_q13 1.16 1655.8±203.04µs ? ?/sec 1.00 1423.1±159.93µs ? ?/sec physical_plan_clickbench_q14 1.12 1722.6±99.00µs ? ?/sec 1.00 1532.4±253.66µs ? ?/sec physical_plan_clickbench_q15 1.15 1673.2±16.20µs ? ?/sec 1.00 1455.6±120.38µs ? ?/sec physical_plan_clickbench_q16 1.13 1442.7±11.38µs ? ?/sec 1.00 1280.9±31.93µs ? ?/sec physical_plan_clickbench_q17 1.17 1515.6±123.83µs ? ?/sec 1.00 1296.1±24.30µs ? ?/sec physical_plan_clickbench_q18 1.12 1381.2±148.46µs ? ?/sec 1.00 1233.7±18.41µs ? ?/sec physical_plan_clickbench_q19 1.10 1689.3±51.93µs ? ?/sec 1.00 1532.3±140.70µs ? ?/sec physical_plan_clickbench_q2 1.12 1368.6±100.33µs ? ?/sec 1.00 1222.7±24.16µs ? ?/sec physical_plan_clickbench_q20 1.05 1237.2±14.74µs ? ?/sec 1.00 1182.7±219.27µs ? ?/sec physical_plan_clickbench_q21 1.12 1353.3±106.07µs ? ?/sec 1.00 1209.1±13.96µs ? ?/sec physical_plan_clickbench_q22 1.13 1817.7±299.10µs ? ?/sec 1.00 1610.8±27.71µs ? ?/sec physical_plan_clickbench_q23 1.12 2.0±0.21ms ? ?/sec 1.00 1797.8±176.74µs ? ?/sec physical_plan_clickbench_q24 1.28 2.5±0.37ms ? ?/sec 1.00 1960.8±138.84µs ? ?/sec physical_plan_clickbench_q25 1.12 1485.4±24.37µs ? ?/sec 1.00 1324.8±124.35µs ? ?/sec physical_plan_clickbench_q26 1.14 1373.9±21.27µs ? ?/sec 1.00 1203.5±14.13µs ? ?/sec physical_plan_clickbench_q27 1.19 1556.4±179.85µs ? ?/sec 1.00 1312.8±16.84µs ? ?/sec physical_plan_clickbench_q28 1.11 1787.1±108.99µs ? ?/sec 1.00 1602.9±148.44µs ? ?/sec physical_plan_clickbench_q29 1.13 2.3±0.02ms ? ?/sec 1.00 1989.6±135.85µs ? ?/sec physical_plan_clickbench_q3 1.08 1346.0±69.08µs ? ?/sec 1.00 1246.4±140.79µs ? ?/sec physical_plan_clickbench_q30 1.31 8.6±0.08ms ? ?/sec 1.00 6.5±0.08ms ? ?/sec physical_plan_clickbench_q31 1.12 1868.6±251.29µs ? ?/sec 1.00 1662.7±137.91µs ? ?/sec physical_plan_clickbench_q32 1.07 1846.9±118.97µs ? ?/sec 1.00 1725.8±222.02µs ? ?/sec physical_plan_clickbench_q33 1.09 1630.4±72.61µs ? ?/sec 1.00 1490.4±147.75µs ? ?/sec physical_plan_clickbench_q34 1.06 1434.8±17.97µs ? ?/sec 1.00 1348.2±111.82µs ? ?/sec physical_plan_clickbench_q35 1.09 1536.3±169.30µs ? ?/sec 1.00 1412.6±227.81µs ? ?/sec physical_plan_clickbench_q36 1.21 2.2±0.26ms ? ?/sec 1.00 1814.2±50.46µs ? ?/sec physical_plan_clickbench_q37 1.19 2.2±0.02ms ? ?/sec 1.00 1813.6±17.51µs ? ?/sec physical_plan_clickbench_q38 1.20 2.2±0.02ms ? ?/sec 1.00 1806.1±19.72µs ? ?/sec physical_plan_clickbench_q39 1.00 1920.2±123.71µs ? ?/sec 1.02 1952.7±496.68µs ? ?/sec physical_plan_clickbench_q4 1.06 1249.8±13.81µs ? ?/sec 1.00 1174.0±129.47µs ? ?/sec physical_plan_clickbench_q40 1.07 2.1±0.02ms ? ?/sec 1.00 1964.5±179.40µs ? ?/sec physical_plan_clickbench_q41 1.08 2.0±0.03ms ? ?/sec 1.00 1897.2±360.88µs ? ?/sec physical_plan_clickbench_q42 1.12 1983.9±163.89µs ? ?/sec 1.00 1765.5±124.55µs ? ?/sec physical_plan_clickbench_q43 1.13 2.0±0.25ms ? ?/sec 1.00 1789.7±137.35µs ? ?/sec physical_plan_clickbench_q44 1.09 1320.8±117.18µs ? ?/sec 1.00 1206.4±12.62µs ? ?/sec physical_plan_clickbench_q45 1.09 1312.0±97.44µs ? ?/sec 1.00 1201.4±32.21µs ? ?/sec physical_plan_clickbench_q46 1.04 1536.8±16.35µs ? ?/sec 1.00 1471.9±319.91µs ? ?/sec physical_plan_clickbench_q47 1.11 1785.4±14.40µs ? ?/sec 1.00 1612.5±269.34µs ? ?/sec physical_plan_clickbench_q48 1.17 2.1±0.16ms ? ?/sec 1.00 1763.0±114.77µs ? ?/sec physical_plan_clickbench_q49 1.16 2.1±0.10ms ? ?/sec 1.00 1841.9±53.85µs ? ?/sec physical_plan_clickbench_q5 1.10 1325.1±13.65µs ? ?/sec 1.00 1199.8±13.83µs ? ?/sec physical_plan_clickbench_q6 1.14 1371.1±87.24µs ? ?/sec 1.00 1201.4±12.84µs ? ?/sec physical_plan_clickbench_q7 1.11 1668.0±202.24µs ? ?/sec 1.00 1505.9±215.37µs ? ?/sec physical_plan_clickbench_q8 1.07 1458.5±106.34µs ? ?/sec 1.00 1360.3±268.51µs ? ?/sec physical_plan_clickbench_q9 1.11 1566.6±11.70µs ? ?/sec 1.00 1412.1±108.23µs ? ?/sec physical_plan_tpcds_all 1.35 655.1±2.04ms ? ?/sec 1.00 484.8±11.59ms ? ?/sec physical_plan_tpch_all 1.34 38.6±0.30ms ? ?/sec 1.00 28.7±0.36ms ? ?/sec physical_plan_tpch_q1 1.36 1217.8±49.60µs ? ?/sec 1.00 897.0±5.98µs ? ?/sec physical_plan_tpch_q10 1.28 1695.3±11.60µs ? ?/sec 1.00 1324.0±12.04µs ? ?/sec physical_plan_tpch_q11 1.21 1520.2±13.43µs ? ?/sec 1.00 1255.0±277.75µs ? ?/sec physical_plan_tpch_q12 1.48 1343.2±245.97µs ? ?/sec 1.00 910.5±59.66µs ? ?/sec physical_plan_tpch_q13 1.24 836.6±44.52µs ? ?/sec 1.00 674.1±47.79µs ? ?/sec physical_plan_tpch_q14 1.36 1025.8±11.05µs ? ?/sec 1.00 754.6±12.43µs ? ?/sec physical_plan_tpch_q16 1.35 1520.9±13.61µs ? ?/sec 1.00 1129.8±24.25µs ? ?/sec physical_plan_tpch_q17 1.37 1490.4±169.95µs ? ?/sec 1.00 1090.4±101.67µs ? ?/sec physical_plan_tpch_q18 1.34 1604.3±109.68µs ? ?/sec 1.00 1198.2±47.00µs ? ?/sec physical_plan_tpch_q19 1.65 3.0±0.03ms ? ?/sec 1.00 1826.9±11.52µs ? ?/sec physical_plan_tpch_q2 1.38 3.2±0.04ms ? ?/sec 1.00 2.4±0.02ms ? ?/sec physical_plan_tpch_q20 1.29 1970.1±19.51µs ? ?/sec 1.00 1523.3±164.47µs ? ?/sec physical_plan_tpch_q21 1.45 2.7±0.24ms ? ?/sec 1.00 1869.4±112.54µs ? ?/sec physical_plan_tpch_q22 1.31 1384.3±28.58µs ? ?/sec 1.00 1056.8±20.60µs ? ?/sec physical_plan_tpch_q3 1.35 1277.6±198.82µs ? ?/sec 1.00 945.0±66.16µs ? ?/sec physical_plan_tpch_q4 1.24 946.5±49.43µs ? ?/sec 1.00 765.5±153.06µs ? ?/sec physical_plan_tpch_q5 1.37 1805.6±138.24µs ? ?/sec 1.00 1321.7±11.72µs ? ?/sec physical_plan_tpch_q6 1.36 655.3±8.70µs ? ?/sec 1.00 482.4±6.81µs ? ?/sec physical_plan_tpch_q7 1.36 2.4±0.04ms ? ?/sec 1.00 1778.4±126.30µs ? ?/sec physical_plan_tpch_q8 1.41 2.9±0.10ms ? ?/sec 1.00 2.1±0.02ms ? ?/sec physical_plan_tpch_q9 1.33 2.2±0.17ms ? ?/sec 1.00 1654.2±444.38µs ? ?/sec physical_select_aggregates_from_200 1.17 15.2±0.59ms ? ?/sec 1.00 13.0±0.68ms ? ?/sec physical_select_all_from_1000 1.09 30.1±0.47ms ? ?/sec 1.00 27.7±0.38ms ? ?/sec physical_select_one_from_700 1.61 1597.4±109.43µs ? ?/sec 1.00 991.4±63.84µs ? ?/sec physical_theta_join_consider_sort 1.60 1776.6±333.62µs ? ?/sec 1.00 1111.7±10.90µs ? ?/sec physical_unnest_to_join 1.59 1503.9±34.19µs ? ?/sec 1.00 946.3±66.67µs ? ?/sec with_param_values_many_columns 20.70 84.5±0.64µs ? ?/sec 1.00 4.1±0.05µs ? ?/sec ``` To sum up this PR contains 3 initial commits: - The 1st commit is not closely related to the main puspose of the PR. It just refactors `Expr::Wildcard` to incorporate its fields to a named `Wildcard` struct. This makes `Expr::Wildcard` similar to other `Expr` items. - The 2nd commit adds `LogicalPlanStats` container to all `LogicalPlan` and `Expr` enum items. `LogicalPlanStats` is just an empty struct in this commit and all the changes are mechanical. - The 3rd commit implements plan pattern optimization in `LogicalPlanStats` and adjusts some of the logical plan traversals to utilize it. ## Are these changes tested? Yes, with existing tests. ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
