alamb commented on code in PR #18648:
URL: https://github.com/apache/datafusion/pull/18648#discussion_r2890677270


##########
datafusion/expr/src/preimage.rs:
##########
@@ -23,7 +23,23 @@ use crate::Expr;
 pub enum PreimageResult {
     /// No preimage exists for the specified value
     None,
-    /// The expression always evaluates to the specified constant
-    /// given that `expr` is within the interval

Review Comment:
   I don't fully understand this logic 
   
   For the example `floor(x) < 8.3`  I would expect there to be no preimage as 
defined here -- specifically there is no range of inputs for which `floor(x)` 
evaluates to `8.3`
   
   I agree that it is valid simplificaition to rewrite `floor(x) < 8.3` to `x < 
9.0`, but it seems different than "preimage" 🤔 
   
   Maybe we just need to give it a different name (maybe that is what you have 
tried to do with `is_boundary`)
   
   



##########
datafusion/expr/src/preimage.rs:
##########
@@ -23,7 +23,23 @@ use crate::Expr;
 pub enum PreimageResult {
     /// No preimage exists for the specified value
     None,
-    /// The expression always evaluates to the specified constant
-    /// given that `expr` is within the interval
-    Range { expr: Expr, interval: Box<Interval> },
+    /// For some UDF, a `preimage` implementation determines that:
+    ///     the result `udf_result` in `udf_result = UDF(expr)`
+    ///     is equivalent to `udf_result = UDF(i)` for any `i` in `interval`.
+    ///
+    /// Then, `is_boundary` indicates a boundary condition where:
+    ///     the original expression `UDF(expr)` is compared to a value `lit` 
where:
+    ///         `UDF(lit) == lit`
+    /// This condition is important for two scenarios:
+    /// 1. `<` and `>=` operators:
+    ///    if `Some(false)`, expression rewrite should use `interval.upper`
+    /// 2. `=` and `!=` operators:
+    ///    if `Some(false)`, expression rewrite can use constant (false and 
true, respectively)
+    ///
+    /// if is_boundary is `None`, then the boundary condition never applies.

Review Comment:
   While trying to understand this, I wonder if it might be easier to express 
by instead of adding a field to `Range`  we could instead add a new variant. 
Something like this:
   
   ```rust
   enum PreimageResult {
     /// ...
     Range { expr: Expr, interval: Box<Interval> }, 
     // The original expression UDF(lit) = lit
     // 1. `<` and `>=` operators:
     ///    if `Some(false)`, expression rewrite should use `interval.upper`
     /// 2. `=` and `!=` operators:
     ///    if `Some(false)`, expression rewrite can use constant (false and 
true, respectively)
     Boundary { expr: Expr, interval: Box<Interval> }, 
   }
   ```



##########
datafusion/expr/src/preimage.rs:
##########
@@ -23,7 +23,23 @@ use crate::Expr;
 pub enum PreimageResult {
     /// No preimage exists for the specified value
     None,
-    /// The expression always evaluates to the specified constant
-    /// given that `expr` is within the interval

Review Comment:
   BTW I checked with datafusion-cli and the `floor(x) < 8.3` case is not 
optimized today (the preimage is not applied here)
   
   ```sql
   > create OR REPLACE table foo(x float) as values (1.0), (8.0), (9.0);
   0 row(s) fetched.
   Elapsed 0.002 seconds.
   
   > select * from foo where floor(x) < 8.3;
   +-----+
   | x   |
   +-----+
   | 1.0 |
   | 8.0 |
   +-----+
   2 row(s) fetched.
   
   > explain select * from foo where floor(x) < 8.3;
   +---------------+-------------------------------+
   | plan_type     | plan                          |
   +---------------+-------------------------------+
   | physical_plan | ┌───────────────────────────┐ |
   |               | │         FilterExec        │ |
   |               | │    --------------------   │ |
   |               | │         predicate:        │ |
   |               | │ CAST(floor(x) AS Float64) │ |
   |               | │           < 8.3           │ |
   |               | └─────────────┬─────────────┘ |
   |               | ┌─────────────┴─────────────┐ |
   |               | │       DataSourceExec      │ |
   |               | │    --------------------   │ |
   |               | │         bytes: 112        │ |
   |               | │       format: memory      │ |
   |               | │          rows: 1          │ |
   |               | └───────────────────────────┘ |
   |               |                               |
   +---------------+-------------------------------+
   1 row(s) fetched.
   Elapsed 0.008 seconds.
   
   > explain format indent select * from foo where floor(x) < 8.3;
   +---------------+------------------------------------------------------+
   | plan_type     | plan                                                 |
   +---------------+------------------------------------------------------+
   | logical_plan  | Filter: CAST(floor(foo.x) AS Float64) < Float64(8.3) |
   |               |   TableScan: foo projection=[x]                      |
   | physical_plan | FilterExec: CAST(floor(x@0) AS Float64) < 8.3        |
   |               |   DataSourceExec: partitions=1, partition_sizes=[1]  |
   |               |                                                      |
   +---------------+------------------------------------------------------+
   2 row(s) fetched.
   Elapsed 0.002 seconds.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to