alamb commented on code in PR #18648:
URL: https://github.com/apache/datafusion/pull/18648#discussion_r2890677270
##########
datafusion/expr/src/preimage.rs:
##########
@@ -23,7 +23,23 @@ use crate::Expr;
pub enum PreimageResult {
/// No preimage exists for the specified value
None,
- /// The expression always evaluates to the specified constant
- /// given that `expr` is within the interval
Review Comment:
I don't fully understand this logic
For the example `floor(x) < 8.3` I would expect there to be no preimage as
defined here -- specifically there is no range of inputs for which `floor(x)`
evaluates to `8.3`
I agree that it is valid simplificaition to rewrite `floor(x) < 8.3` to `x <
9.0`, but it seems different than "preimage" 🤔
Maybe we just need to give it a different name (maybe that is what you have
tried to do with `is_boundary`)
##########
datafusion/expr/src/preimage.rs:
##########
@@ -23,7 +23,23 @@ use crate::Expr;
pub enum PreimageResult {
/// No preimage exists for the specified value
None,
- /// The expression always evaluates to the specified constant
- /// given that `expr` is within the interval
- Range { expr: Expr, interval: Box<Interval> },
+ /// For some UDF, a `preimage` implementation determines that:
+ /// the result `udf_result` in `udf_result = UDF(expr)`
+ /// is equivalent to `udf_result = UDF(i)` for any `i` in `interval`.
+ ///
+ /// Then, `is_boundary` indicates a boundary condition where:
+ /// the original expression `UDF(expr)` is compared to a value `lit`
where:
+ /// `UDF(lit) == lit`
+ /// This condition is important for two scenarios:
+ /// 1. `<` and `>=` operators:
+ /// if `Some(false)`, expression rewrite should use `interval.upper`
+ /// 2. `=` and `!=` operators:
+ /// if `Some(false)`, expression rewrite can use constant (false and
true, respectively)
+ ///
+ /// if is_boundary is `None`, then the boundary condition never applies.
Review Comment:
While trying to understand this, I wonder if it might be easier to express
by instead of adding a field to `Range` we could instead add a new variant.
Something like this:
```rust
enum PreimageResult {
/// ...
Range { expr: Expr, interval: Box<Interval> },
// The original expression UDF(lit) = lit
// 1. `<` and `>=` operators:
/// if `Some(false)`, expression rewrite should use `interval.upper`
/// 2. `=` and `!=` operators:
/// if `Some(false)`, expression rewrite can use constant (false and
true, respectively)
Boundary { expr: Expr, interval: Box<Interval> },
}
```
##########
datafusion/expr/src/preimage.rs:
##########
@@ -23,7 +23,23 @@ use crate::Expr;
pub enum PreimageResult {
/// No preimage exists for the specified value
None,
- /// The expression always evaluates to the specified constant
- /// given that `expr` is within the interval
Review Comment:
BTW I checked with datafusion-cli and the `floor(x) < 8.3` case is not
optimized today (the preimage is not applied here)
```sql
> create OR REPLACE table foo(x float) as values (1.0), (8.0), (9.0);
0 row(s) fetched.
Elapsed 0.002 seconds.
> select * from foo where floor(x) < 8.3;
+-----+
| x |
+-----+
| 1.0 |
| 8.0 |
+-----+
2 row(s) fetched.
> explain select * from foo where floor(x) < 8.3;
+---------------+-------------------------------+
| plan_type | plan |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
| | │ FilterExec │ |
| | │ -------------------- │ |
| | │ predicate: │ |
| | │ CAST(floor(x) AS Float64) │ |
| | │ < 8.3 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ DataSourceExec │ |
| | │ -------------------- │ |
| | │ bytes: 112 │ |
| | │ format: memory │ |
| | │ rows: 1 │ |
| | └───────────────────────────┘ |
| | |
+---------------+-------------------------------+
1 row(s) fetched.
Elapsed 0.008 seconds.
> explain format indent select * from foo where floor(x) < 8.3;
+---------------+------------------------------------------------------+
| plan_type | plan |
+---------------+------------------------------------------------------+
| logical_plan | Filter: CAST(floor(foo.x) AS Float64) < Float64(8.3) |
| | TableScan: foo projection=[x] |
| physical_plan | FilterExec: CAST(floor(x@0) AS Float64) < 8.3 |
| | DataSourceExec: partitions=1, partition_sizes=[1] |
| | |
+---------------+------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.002 seconds.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]