alamb commented on code in PR #7359: URL: https://github.com/apache/arrow-datafusion/pull/7359#discussion_r1305928861
########## docs/source/library-user-guide/working-with-exprs.md: ########## @@ -19,4 +19,166 @@ # Working with Exprs -Coming Soon +<!-- https://github.com/apache/arrow-datafusion/issues/7304 --> + +`Expr` is short for "expression". It is a core abstraction in DataFusion for representing a computation, and follows the standard "expression tree" abstraction found in most compilers and databases. + +For example, the SQL expression `a + b` would be represented as an `Expr` with a `BinaryExpr` variant. A `BinaryExpr` has a left and right `Expr` and an operator. + +As another example, the SQL expression `a + b * c` would be represented as an `Expr` with a `BinaryExpr` variant. The left `Expr` would be `a` and the right `Expr` would be another `BinaryExpr` with a left `Expr` of `b` and a right `Expr` of `c`. As a classic expression tree, this would look like: + +```text + ┌────────────────────┐ + │ BinaryExpr │ + │ op: + │ + └────────────────────┘ + ▲ ▲ + ┌───────┘ └────────────────┐ + │ │ +┌────────────────────┐ ┌────────────────────┐ +│ Expr::Col │ │ BinaryExpr │ +│ col: a │ │ op: * │ +└────────────────────┘ └────────────────────┘ + ▲ ▲ + ┌────────┘ └─────────┐ + │ │ + ┌────────────────────┐ ┌────────────────────┐ + │ Expr::Col │ │ Expr::Col │ + │ col: b │ │ col: c │ + └────────────────────┘ └────────────────────┘ +``` + +As the writer of a library, you may want to use or create `Expr`s to represent computations that you want to perform. This guide will walk you through how to make your own scalar UDF as an `Expr` and how to rewrite `Expr`s to inline the simple UDF. + +There are also executable examples for working with `Expr`s: + +- [rewrite_expr.rs](../../../datafusion-examples/examples/catalog.rs) +- [expr_api.rs](../../../datafusion-examples/examples/expr_api.rs) + +## A Scalar UDF Example + +We'll use a `ScalarUDF` expression as our example. This necessitates implementing an actual UDF, and for ease we'll use the same example from the [adding UDFs](./adding-udfs.md) guide. + +So assuming you've written that function, you can use it to create an `Expr`: + +```rust +let add_one_udf = create_udf( + "add_one", + vec![DataType::Int64], + Arc::new(DataType::Int64), + Volatility::Immutable, + make_scalar_function(add_one), // <-- the function we wrote +); + +// make the expr `add_one(5)` +let expr = add_one_udf.call(vec![lit(5)]); + +// make the expr `add_one(my_column)` +let expr = add_one_udf.call(vec![col("my_column")]); +``` + +If you'd like to learn more about `Expr`s, before we get into the details of creating and rewriting them, you can read the [expression user-guide](./../user-guide/expressions.md). + +## Rewriting Exprs + +Rewriting Expressions is the process of taking an `Expr` and transforming it into another `Expr`. This is useful for a number of reasons, including: + +- Simplifying `Expr`s to make them easier to evaluate +- Optimizing `Expr`s to make them faster to evaluate +- Converting `Expr`s to other forms, e.g. converting a `BinaryExpr` to a `CastExpr` + +In our example, we'll use rewriting to update our `add_one` UDF, to be rewritten as a `BinaryExpr` with a `Literal` of 1. We're effectively inlining the UDF. + +### Rewriting with `transform` + +To implement the inlining, we'll need to write a function that takes an `Expr` and returns a `Result<Expr>`. If the expression is _not_ to be rewritten `Transformed::No` is used to wrap the original `Expr`. If the expression _is_ to be rewritten, `Transformed::Yes` is used to wrap the new `Expr`. + +```rust +fn rewrite_add_one(expr: Expr) -> Result<Expr> { + expr.transform(&|expr| { + Ok(match expr { + Expr::ScalarUDF(scalar_fun) if scalar_fun.fun.name == "add_one" => { + let input_arg = scalar_fun.args[0].clone(); + let new_expression = input_arg + lit(1i64); + + Transformed::Yes(new_expression) + } + _ => Transformed::No(expr), + }) + }) +} +``` + +### Creating an `OptimizerRule` + +In DataFusion, an `OptimizerRule` is a trait that supports rewriting`Expr`s that appear in various parts of the `LogicalPlan`. It follows DataFusion's general mantra of trait implementations to drive behavior. + +We'll call our rule `AddOneInliner` and implement the `OptimizerRule` trait. The `OptimizerRule` trait has two methods: + +- `name` - returns the name of the rule +- `try_optimize` - takes a `LogicalPlan` and returns an `Option<LogicalPlan>`. If the rule is able to optimize the plan, it returns `Some(LogicalPlan)` with the optimized plan. If the rule is not able to optimize the plan, it returns `None`. + +```rust +struct AddOneInliner {} + +impl OptimizerRule for AddOneInliner { + fn name(&self) -> &str { + "add_one_inliner" + } + + fn try_optimize( + &self, + plan: &LogicalPlan, + config: &dyn OptimizerConfig, + ) -> Result<Option<LogicalPlan>> { + // Map over the expressions and rewrite them + let new_expressions = plan + .expressions() + .into_iter() + .map(|expr| rewrite_add_one(expr)) + .collect::<Result<Vec<_>>>()?; + + let inputs = plan.inputs().into_iter().cloned().collect::<Vec<_>>(); + + let plan = plan.with_new_exprs(&new_expressions, &inputs); Review Comment: 👍 ########## docs/source/library-user-guide/working-with-exprs.md: ########## @@ -19,4 +19,166 @@ # Working with Exprs -Coming Soon +<!-- https://github.com/apache/arrow-datafusion/issues/7304 --> + +`Expr` is short for "expression". It is a core abstraction in DataFusion for representing a computation, and follows the standard "expression tree" abstraction found in most compilers and databases. + +For example, the SQL expression `a + b` would be represented as an `Expr` with a `BinaryExpr` variant. A `BinaryExpr` has a left and right `Expr` and an operator. + +As another example, the SQL expression `a + b * c` would be represented as an `Expr` with a `BinaryExpr` variant. The left `Expr` would be `a` and the right `Expr` would be another `BinaryExpr` with a left `Expr` of `b` and a right `Expr` of `c`. As a classic expression tree, this would look like: + +```text + ┌────────────────────┐ + │ BinaryExpr │ + │ op: + │ + └────────────────────┘ + ▲ ▲ + ┌───────┘ └────────────────┐ + │ │ +┌────────────────────┐ ┌────────────────────┐ +│ Expr::Col │ │ BinaryExpr │ +│ col: a │ │ op: * │ +└────────────────────┘ └────────────────────┘ + ▲ ▲ + ┌────────┘ └─────────┐ + │ │ + ┌────────────────────┐ ┌────────────────────┐ + │ Expr::Col │ │ Expr::Col │ + │ col: b │ │ col: c │ + └────────────────────┘ └────────────────────┘ +``` + +As the writer of a library, you may want to use or create `Expr`s to represent computations that you want to perform. This guide will walk you through how to make your own scalar UDF as an `Expr` and how to rewrite `Expr`s to inline the simple UDF. + +There are also executable examples for working with `Expr`s: + +- [rewrite_expr.rs](../../../datafusion-examples/examples/catalog.rs) +- [expr_api.rs](../../../datafusion-examples/examples/expr_api.rs) + +## A Scalar UDF Example + +We'll use a `ScalarUDF` expression as our example. This necessitates implementing an actual UDF, and for ease we'll use the same example from the [adding UDFs](./adding-udfs.md) guide. + +So assuming you've written that function, you can use it to create an `Expr`: + +```rust +let add_one_udf = create_udf( + "add_one", + vec![DataType::Int64], + Arc::new(DataType::Int64), + Volatility::Immutable, + make_scalar_function(add_one), // <-- the function we wrote +); + +// make the expr `add_one(5)` +let expr = add_one_udf.call(vec![lit(5)]); + +// make the expr `add_one(my_column)` +let expr = add_one_udf.call(vec![col("my_column")]); +``` + +If you'd like to learn more about `Expr`s, before we get into the details of creating and rewriting them, you can read the [expression user-guide](./../user-guide/expressions.md). + +## Rewriting Exprs + +Rewriting Expressions is the process of taking an `Expr` and transforming it into another `Expr`. This is useful for a number of reasons, including: + +- Simplifying `Expr`s to make them easier to evaluate +- Optimizing `Expr`s to make them faster to evaluate +- Converting `Expr`s to other forms, e.g. converting a `BinaryExpr` to a `CastExpr` + +In our example, we'll use rewriting to update our `add_one` UDF, to be rewritten as a `BinaryExpr` with a `Literal` of 1. We're effectively inlining the UDF. + +### Rewriting with `transform` + +To implement the inlining, we'll need to write a function that takes an `Expr` and returns a `Result<Expr>`. If the expression is _not_ to be rewritten `Transformed::No` is used to wrap the original `Expr`. If the expression _is_ to be rewritten, `Transformed::Yes` is used to wrap the new `Expr`. + +```rust +fn rewrite_add_one(expr: Expr) -> Result<Expr> { + expr.transform(&|expr| { + Ok(match expr { + Expr::ScalarUDF(scalar_fun) if scalar_fun.fun.name == "add_one" => { + let input_arg = scalar_fun.args[0].clone(); + let new_expression = input_arg + lit(1i64); + + Transformed::Yes(new_expression) + } + _ => Transformed::No(expr), + }) + }) +} +``` + +### Creating an `OptimizerRule` + +In DataFusion, an `OptimizerRule` is a trait that supports rewriting`Expr`s that appear in various parts of the `LogicalPlan`. It follows DataFusion's general mantra of trait implementations to drive behavior. + +We'll call our rule `AddOneInliner` and implement the `OptimizerRule` trait. The `OptimizerRule` trait has two methods: + +- `name` - returns the name of the rule +- `try_optimize` - takes a `LogicalPlan` and returns an `Option<LogicalPlan>`. If the rule is able to optimize the plan, it returns `Some(LogicalPlan)` with the optimized plan. If the rule is not able to optimize the plan, it returns `None`. + +```rust +struct AddOneInliner {} + +impl OptimizerRule for AddOneInliner { + fn name(&self) -> &str { + "add_one_inliner" + } + + fn try_optimize( + &self, + plan: &LogicalPlan, + config: &dyn OptimizerConfig, + ) -> Result<Option<LogicalPlan>> { + // Map over the expressions and rewrite them + let new_expressions = plan + .expressions() + .into_iter() + .map(|expr| rewrite_add_one(expr)) + .collect::<Result<Vec<_>>>()?; + + let inputs = plan.inputs().into_iter().cloned().collect::<Vec<_>>(); + + let plan = plan.with_new_exprs(&new_expressions, &inputs); Review Comment: 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
