Jefffrey commented on code in PR #20012:
URL: https://github.com/apache/datafusion/pull/20012#discussion_r2754663178
##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -598,13 +598,32 @@ impl ExprSchemable for Expr {
)
})?;
- let arguments = args
+ let coerced_values: Vec<Option<ScalarValue>> = args
.iter()
- .map(|e| match e {
- Expr::Literal(sv, _) => Some(sv),
- _ => None,
+ .zip(new_fields.iter())
+ .map(|(expr, field)| {
+ let mut current_expr = expr;
+
+ // Loop to remove all casting layers
+ loop {
+ match current_expr {
+ Expr::Cast(cast) => current_expr = &cast.expr,
+ Expr::TryCast(cast) => current_expr =
&cast.expr,
+ _ => break,
+ }
+ }
+
+ let literal = if let Expr::Literal(sv, _) =
current_expr {
+ Some(sv)
+ } else {
+ None
+ };
+
+ literal.map(|sv|
sv.cast_to(field.data_type())).transpose()
Review Comment:
Ah you're right; and it highlights another current inconsistency; when
`return_field_from_args()` is called in the context of `select_to_plan` (aka
initially forming the LogicalPlan) it will have `Expr::Literal` passed in for
scalar arguments to compute the field. However, later in the type coercion pass
it adds a `Cast` wrapping this and in the current code on main this would
actually mean we pass in `None` for the scalar argument since we don't unwrap
the cast as this PR does, potentially leading to an inconsistent result 🤔
https://github.com/apache/datafusion/blob/66ee0afcabfd59323eeb7b5f76f60a80881bbba3/datafusion/expr/src/expr_schema.rs#L538-L548
For now I don't think this is an actual bug that will crop up (we don't have
too many UDFs relying on `scalar_arguments` yet) but it is definitely an
inconsistency to address. It partially relates to #13825 as well since if we
are able to compute these properties once upfront then it might be easier to
handle.
I'll try think about it more, but I suppose this is the easier fix for now;
other ways to fix would be more involved such as:
- #13825 way in reducing the amount of calls to `return_field_from_args()`
to just once upfront; this does mean we wouldn't need this unwrapping logic so
calls like `SELECT test_udf(CAST(1 AS INT))` wouldn't work as the argument is
no longer a literal, it is a cast expression (I think this is intended
behaviour)
- Alter type coercion logic to immediately apply cast to literals, instead
of wrapping in a `Expr::Cast`; similar to how we unwrap the casts here then
apply the cast, but instead of unwrapping here we rely on type coercion to do
the cast for us
(I'm not advocating for the above steps, just laying out some of my thoughts
as I think through this)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]