Jefffrey commented on code in PR #20012:
URL: https://github.com/apache/datafusion/pull/20012#discussion_r2754663178


##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -598,13 +598,32 @@ impl ExprSchemable for Expr {
                         )
                     })?;
 
-                let arguments = args
+                let coerced_values: Vec<Option<ScalarValue>> = args
                     .iter()
-                    .map(|e| match e {
-                        Expr::Literal(sv, _) => Some(sv),
-                        _ => None,
+                    .zip(new_fields.iter())
+                    .map(|(expr, field)| {
+                        let mut current_expr = expr;
+
+                        // Loop to remove all casting layers
+                        loop {
+                            match current_expr {
+                                Expr::Cast(cast) => current_expr = &cast.expr,
+                                Expr::TryCast(cast) => current_expr = 
&cast.expr,
+                                _ => break,
+                            }
+                        }
+
+                        let literal = if let Expr::Literal(sv, _) = 
current_expr {
+                            Some(sv)
+                        } else {
+                            None
+                        };
+
+                        literal.map(|sv| 
sv.cast_to(field.data_type())).transpose()

Review Comment:
   Ah you're right; and it highlights another current inconsistency; when 
`return_field_from_args()` is called in the context of `select_to_plan` (aka 
initially forming the LogicalPlan) it will have `Expr::Literal` passed in for 
scalar arguments to compute the field. However, later in the type coercion pass 
it adds a `Cast` wrapping this and in the current code on main this would 
actually mean we pass in `None` for the scalar argument since we don't unwrap 
the cast as this PR does, potentially leading to an inconsistent result 🤔
   
   
https://github.com/apache/datafusion/blob/66ee0afcabfd59323eeb7b5f76f60a80881bbba3/datafusion/expr/src/expr_schema.rs#L538-L548
   
   For now I don't think this is an actual bug that will crop up (we don't have 
too many UDFs relying on `scalar_arguments` yet) but it is definitely an 
inconsistency to address. It partially relates to #13825 as well since if we 
are able to compute these properties once upfront then it might be easier to 
handle.
   
   I'll try think about it more, but I suppose this is the easier fix for now; 
other ways to fix would be more involved such as:
   
   - #13825 way in reducing the amount of calls to `return_field_from_args()` 
to just once upfront; this does mean we wouldn't need this unwrapping logic so 
calls like `SELECT test_udf(CAST(1 AS INT))` wouldn't work as the argument is 
no longer a literal, it is a cast expression (I think this is intended 
behaviour)
   - Alter type coercion logic to immediately apply cast to literals, instead 
of wrapping in a `Expr::Cast`; similar to how we unwrap the casts here then 
apply the cast, but instead of unwrapping here we rely on type coercion to do 
the cast for us
   
   (I'm not advocating for the above steps, just laying out some of my thoughts 
as I think through this)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to