drin commented on code in PR #20008:
URL: https://github.com/apache/datafusion/pull/20008#discussion_r2729320568


##########
datafusion/expr/src/udf.rs:
##########
@@ -709,20 +709,49 @@ pub trait ScalarUDFImpl: Debug + DynEq + DynHash + Send + 
Sync {
         Ok(ExprSimplifyResult::Original(args))
     }
 
-    /// Returns the [preimage] for this function and the specified scalar 
value, if any.
+    /// Returns a single contiguous preimage for this function and the 
specified
+    /// scalar expression, if any.
     ///
-    /// A preimage is a single contiguous [`Interval`] of values where the 
function
-    /// will always return `lit_value`
+    /// # Return Value
     ///
-    /// Implementations should return intervals with an inclusive lower bound 
and
-    /// exclusive upper bound.
+    /// Implementations should return a half-open interval: inclusive lower
+    /// bound and exclusive upper bound. Note that this is slightly different
+    /// from normal [`Interval`] semantics where the upper bound is closed. The
+    /// upper endpoint should be adjusted to the next value.
     ///
-    /// This rewrite is described in the [ClickHouse Paper] and is particularly
-    /// useful for simplifying expressions `date_part` or equivalent 
functions. The
-    /// idea is that if you have an expression like `date_part(YEAR, k) = 
2024` and you
-    /// can find a [preimage] for `date_part(YEAR, k)`, which is the range of 
dates
-    /// covering the entire year of 2024. Thus, you can rewrite the expression 
to `k
-    /// >= '2024-01-01' AND k < '2025-01-01' which is often more optimizable.
+    /// # Background
+    ///
+    /// A [preimage] here is a single contiguous [`Interval`] of the function's
+    /// argument(s) where the function will return a single literal (constant)
+    /// value. This can also be thought of as a form of interval containment.
+    ///
+    /// Using a preimage to rewrite predicates is described in the [ClickHouse
+    /// Paper]:
+    ///
+    /// > some functions can compute the preimage of a given function result.
+    /// > This is used to replace comparisons of constants with function calls
+    /// > on the key columns by comparing the key column value with the 
preimage.
+    /// > For example, `toYear(k) = 2024` can be replaced by
+    /// > `k >= 2024-01-01 && k < 2025-01-01`
+    ///
+    /// As mentioned above, the preimage can be used to simplify certain types 
of
+    /// expressions such as `date_part` into a form that is more optimizable.
+    ///
+    /// For example, given an expression like
+    /// ```sql
+    /// date_part(YEAR, k) = 2024
+    /// ```
+    ///
+    /// There is a single preimage [`2024-01-01`, `2025-01-01`), which is the
+    /// range of dates covering the entire year of 2024 for which
+    /// `date_part(YEAR, k)` evaluates to `2024`. Using this preimage the
+    /// expression can be rewritten to

Review Comment:
   This is an attempt to more clearly use the [image of an 
element](https://en.wikipedia.org/wiki/Image_(mathematics)#Image_of_an_element) 
definition rather than the [image of a function 
definition](https://en.wikipedia.org/wiki/Image_(mathematics)#Image_of_a_function).
   
   In this specific example, `2024` is a value of interest output by 
`date_part(YEAR, k)` and so we are concerned with it as an element for which we 
want to know preimage values. Whereas the image of `date_part(YEAR, k)` is any 
year (which is any value in the set of integers).
   
   This is probably pedantic, but as long as we are using `image` and 
`preimage` terminology at all, I think it makes sense to be properly specific. 
And, according to wikipedia:
   > In these definitions, f : X → Y is a function from the set X to the set Y.
   
   Thus, either `date_part` is the function and x is `(YEAR, k)` or 
`date_part(YEAR)` is the function and x is `k`.
   
   I think all of the terminology is cleanest if we think about the function 
being `date_part(YEAR)` and the input being a scalar date timestamp and the 
output being a scalar date timestamp and set X being a set of timestamps and 
the set Y being a set of timestamps (which has a cardinality of 1 for this 
function).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to