alamb opened a new issue, #23264: URL: https://github.com/apache/datafusion/issues/23264
### Is your feature request related to a problem or challenge? DataFusion represents arbitrary binary Operator trees like ```sql (a OR (b OR (c OR (d OR ...))) ``` Using a binary tree structure called `BinaryExpr`: https://github.com/apache/datafusion/blob/2da88876f6e764ca3052c92b9d1783929d13d7bf/datafusion/expr/src/expr.rs#L770-L779 There are also many places in the codebase where recursive algorithms are used to walk over Expr trees, An example of this is `Treenode::apply`: https://github.com/apache/datafusion/blob/0838a4ddb902535b0e95a1c5a254be7e9c7fe9bf/datafusion/common/src/tree_node.rs#L196-L209 The stack depth is a function of the nesting depth of the expression, which means that > [!IMPORTANT] > In some cases processing deeply nested expressions can lean to a stack overflow (and thus a process `abort`) This shows up in practice from Machine-generated SQL/filters with thousands of OR/AND terms (e.g. IN lists rewritten to OR, or large predicate pushdown This representation has served us quite well, but we do have several mitigations 1. [`recursive` crate](https://crates.io/crates/recursive) + recursive_protection feature (segmented stack growth) The primary mitigation. Recursive TreeNode drivers are annotated `#[cfg_attr(feature = "recursive_protection", recursive::recursive)]`, which uses the recursive crate to grow the stack on the heap when it runs low, instead of overflowing. See the TreeNode walkers (6 sites — apply/transform_down/transform_up/rewrite/etc.): https://github.com/apache/datafusion/blob/0838a4ddb902535b0e95a1c5a254be7e9c7fe9bf/datafusion/common/src/tree_node.rs#L196-L209 2. SQL parser recursion limit (bounded recursion → error, not crash) The [sqlparser crate ](https://crates.io/crates/sqlparser) itself caps nesting depth (default 50) and returns RecursionLimitExceeded rather than overflowing the stack. 3. Explicit-stack (iterative) SQL→Expr conversion - See issue #1444 We also hit this a bunch when converting a [sqlparser](https://crates.io/crates/sqlparser) binary-operator tree into `Expr` and have a stack based conversion approach (see https://github.com/apache/datafusion/blob/0838a4ddb902535b0e95a1c5a254be7e9c7fe9bf/datafusion/sql/src/expr/mod.rs#L76-L120) 4. Out-of-line match arms in the expression simplifier Large match arms are extracted into separate functions specifically to shrink per-frame stack usage and avoid overflow during simplification. - https://github.com/apache/datafusion/blob/0838a4ddb902535b0e95a1c5a254be7e9c7fe9bf/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L2335-L2341 Recently people seem to be hitting recursion limits again and trying to put in patches, such as * https://github.com/apache/datafusion/pull/23198 from @mdashti * https://github.com/apache/datafusion/pull/22943 from @nathanb ### Describe the solution you'd like I would like a more fool proof way to avoid stack overflow issues ### Describe alternatives you've considered ## Option 1: n-ary expressions The approach taken by DuckDb Mysql and Postgres is to represent `AND` and `OR` as n-ary rather than binary (2) So that would be something like this (could also have an enum for And/Or) ```rust enum ConjunctionType { And, Or } /// AND/OR of some number of expressions struct Conjunction { conjunction_type: ConjunctionType, /// Logically `exprs[0] AND (exprs[1] AND (exprs[2] AND ...` exprs : Vec<Expr>. } ``` And then ```rust /// extend the Expr enum enum Expr { ... // AND Conjunction(Conjunction), ... } ``` ### Examples: DuckDB uses ConjunctionExpression: https://github.com/duckdb/duckdb/blob/b8a06e4a22672e254cd0baa68a3dbed2eb51c56e/src/include/duckdb/parser/expression/conjunction_expression.hpp#L17-L27 Postrgres uses BoolExpr https://github.com/postgres/postgres/blob/REL_17_0/src/include/nodes/primnodes.h#L929-L942 Mysql uses [`Item_cond_and` / `Item_cond_or`](https://github.com/mysql/mysql-server/blob/mysql-8.0.40/sql/item_cmpfunc.h#L2711-L2768) (based on [Item_cond base which stores args as a List<Item> list](https://github.com/mysql/mysql-server/blob/mysql-8.0.40/sql/item_cmpfunc.h#L2421-L2447); ### Cons This would be a major downstream API change Also unless we removed Operator::And / Operator::Or we would potentially have multiple ways to express the same logical expression ## Keep Binary Expr, have some sort of "balance" operation Another approach taken by spark is to use Binary expressions: https://github.com/apache/spark/blob/fa33ea000a0bda9e5a3fa1af98e8e85b8cc5e4d4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L828 [`buildBalancedPredicate`](https://github.com/apache/spark/blob/fa33ea000a0bda9e5a3fa1af98e8e85b8cc5e4d4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L160-L184) rebuilds a predicate list into a height-balanced tree in - https://github.com/apache/spark/pull/31724 ### Additional context Recent PRs * https://github.com/apache/datafusion/pull/23198 from @mdashti * https://github.com/apache/datafusion/pull/22943 from @nathanb 🎯 Umbrella / strategic (open) - #16788 — the epic to explore iterative rewrites / Box-ing / n-ary to remove the need for #[recursive] and regain the 1–2% planning overhead. - #14857 - #19787 - #14118 🐛 Open stack-overflow bugs (motivating cases) - #22229 - #22936 - #8900 - #23246 📜 Closed — the history that motivated the current mitigations - #1444 - #1434 - #9375 - #11102 - #16030 - #4066 - #3968 - #23056 - #9573 - #9373 - #15087 - TPC-DS planning overflows: #6277, #6264, #6040, #4786, #4065, #8837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
