drin commented on code in PR #20008:
URL: https://github.com/apache/datafusion/pull/20008#discussion_r2729320568
##########
datafusion/expr/src/udf.rs:
##########
@@ -709,20 +709,49 @@ pub trait ScalarUDFImpl: Debug + DynEq + DynHash + Send +
Sync {
Ok(ExprSimplifyResult::Original(args))
}
- /// Returns the [preimage] for this function and the specified scalar
value, if any.
+ /// Returns a single contiguous preimage for this function and the
specified
+ /// scalar expression, if any.
///
- /// A preimage is a single contiguous [`Interval`] of values where the
function
- /// will always return `lit_value`
+ /// # Return Value
///
- /// Implementations should return intervals with an inclusive lower bound
and
- /// exclusive upper bound.
+ /// Implementations should return a half-open interval: inclusive lower
+ /// bound and exclusive upper bound. Note that this is slightly different
+ /// from normal [`Interval`] semantics where the upper bound is closed. The
+ /// upper endpoint should be adjusted to the next value.
///
- /// This rewrite is described in the [ClickHouse Paper] and is particularly
- /// useful for simplifying expressions `date_part` or equivalent
functions. The
- /// idea is that if you have an expression like `date_part(YEAR, k) =
2024` and you
- /// can find a [preimage] for `date_part(YEAR, k)`, which is the range of
dates
- /// covering the entire year of 2024. Thus, you can rewrite the expression
to `k
- /// >= '2024-01-01' AND k < '2025-01-01' which is often more optimizable.
+ /// # Background
+ ///
+ /// A [preimage] here is a single contiguous [`Interval`] of the function's
+ /// argument(s) where the function will return a single literal (constant)
+ /// value. This can also be thought of as a form of interval containment.
+ ///
+ /// Using a preimage to rewrite predicates is described in the [ClickHouse
+ /// Paper]:
+ ///
+ /// > some functions can compute the preimage of a given function result.
+ /// > This is used to replace comparisons of constants with function calls
+ /// > on the key columns by comparing the key column value with the
preimage.
+ /// > For example, `toYear(k) = 2024` can be replaced by
+ /// > `k >= 2024-01-01 && k < 2025-01-01`
+ ///
+ /// As mentioned above, the preimage can be used to simplify certain types
of
+ /// expressions such as `date_part` into a form that is more optimizable.
+ ///
+ /// For example, given an expression like
+ /// ```sql
+ /// date_part(YEAR, k) = 2024
+ /// ```
+ ///
+ /// There is a single preimage [`2024-01-01`, `2025-01-01`), which is the
+ /// range of dates covering the entire year of 2024 for which
+ /// `date_part(YEAR, k)` evaluates to `2024`. Using this preimage the
+ /// expression can be rewritten to
Review Comment:
This is an attempt to more clearly use the [image of an
element](https://en.wikipedia.org/wiki/Image_(mathematics)#Image_of_an_element)
definition rather than the [image of a function
definition](https://en.wikipedia.org/wiki/Image_(mathematics)#Image_of_a_function).
In this specific example, `2024` is a value of interest output by
`date_part(YEAR, k)` and so we are concerned with it as an element for which we
want to know preimage values. Whereas the image of `date_part(YEAR, k)` is any
year (which is any value in the set of integers).
This is probably pedantic, but as long as we are using `image` and
`preimage` terminology at all, I think it makes sense to be properly specific.
And, according to wikipedia:
> In these definitions, f : X → Y is a function from the set X to the set Y.
Thus, either `date_part` is the function and x is `(YEAR, k)` or
`date_part(YEAR)` is the function and x is `k`.
I think all of the terminology is cleanest if we think about the function
being `date_part(YEAR)` and the input being a scalar date timestamp and the
output being a scalar date timestamp and set X being a set of timestamps and
the set Y being a set of timestamps (which has a cardinality of 1 for this
function).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]