findepi commented on code in PR #15110:
URL: https://github.com/apache/datafusion/pull/15110#discussion_r1989510726
##########
datafusion/optimizer/src/simplify_expressions/unwrap_cast.rs:
##########
@@ -177,6 +192,45 @@ pub(super) fn
is_cast_expr_and_support_unwrap_cast_in_comparison_for_inlist<
true
}
+fn cast_literal_to_type_with_op(
+ lit_value: &ScalarValue,
+ target_type: &DataType,
+ op: Operator,
+) -> Option<ScalarValue> {
+ macro_rules! cast_or_else_return_none {
+ ($value:ident, $ty:expr) => {{
+ let opts = arrow::compute::CastOptions {
+ safe: false,
+ format_options: Default::default(),
+ };
+ let array = ScalarValue::to_array($value).ok()?;
+ let casted = arrow::compute::cast_with_options(&array, &$ty,
&opts).ok()?;
+ let scalar = ScalarValue::try_from_array(&casted, 0).ok()?;
+ Some(scalar)
+ }};
+ }
+
+ match (op, lit_value) {
+ (
+ Operator::Eq | Operator::NotEq,
+ ScalarValue::Utf8(Some(_))
+ | ScalarValue::Utf8View(Some(_))
+ | ScalarValue::LargeUtf8(Some(_)),
+ ) => match target_type {
+ DataType::Int8 => cast_or_else_return_none!(lit_value,
DataType::Int8),
+ DataType::Int16 => cast_or_else_return_none!(lit_value,
DataType::Int16),
+ DataType::Int32 => cast_or_else_return_none!(lit_value,
DataType::Int32),
+ DataType::Int64 => cast_or_else_return_none!(lit_value,
DataType::Int64),
+ DataType::UInt8 => cast_or_else_return_none!(lit_value,
DataType::UInt8),
+ DataType::UInt16 => cast_or_else_return_none!(lit_value,
DataType::UInt16),
+ DataType::UInt32 => cast_or_else_return_none!(lit_value,
DataType::UInt32),
+ DataType::UInt64 => cast_or_else_return_none!(lit_value,
DataType::UInt64),
+ _ => None,
+ },
+ _ => None,
Review Comment:
As it's written, and as the function is named, this looks like generic
approach that will work many different source/target type pairs, we just need
to add logic.
However, the exact checks necessary for soundness will likely differ.
For casts between exact numbers (ints and decimals) and string types, the
https://github.com/apache/datafusion/pull/15110#issuecomment-2714474220 looks
like a sound plan.
Plus, we could also simplify impossible target literals like this
- `cast(an_int as string) = 'abc'` or `cast(an_int as string) = '007'` to
`if(an_int is null, null, false)`
For casts between numbers and numbers, we likely need to recognize max
addressible values, and cast impreciseness, especially when we want to handle
`<`, `<=`, `>`, `>=` operators.
https://github.com/trinodb/trino/blob/a870656f406b0f76e97c740659a58554d472994d/core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/UnwrapCastInComparison.java#L199-L354
might serve as a good reference. Note how it handles NaN, max values, lossy
round-tripping, sometimes converting `<` to `<=` as necessary.
For e.g cast from timestamp to date, the logic will likely be different again
https://github.com/trinodb/trino/blob/a870656f406b0f76e97c740659a58554d472994d/core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/UnwrapCastInComparison.java#L357-L389.
With all this in mind, I'd suggest structuring this code so it's clear it
addresses only the exact number and string case, by checking expr and literal
types and delegating to a function that has "int" and "utf8" or "string" in the
name.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]