paleolimbot commented on code in PR #18120:
URL: https://github.com/apache/datafusion/pull/18120#discussion_r2466593754


##########
datafusion/expr/src/expr.rs:
##########
@@ -3283,11 +3280,11 @@ impl Display for Expr {
                 }
                 write!(f, "END")
             }
-            Expr::Cast(Cast { expr, data_type }) => {
-                write!(f, "CAST({expr} AS {data_type})")
+            Expr::Cast(Cast { expr, field }) => {
+                write!(f, "CAST({expr} AS {})", field.data_type())

Review Comment:
   There's now a `format_type_and_metadata()` helper in datafusion_common that 
will ensure that casts with metadata are displayed. That display is a bit 
verbose at the moment when metadata exists but it makes it quite a bit easier 
to track down bugs where the metadata was dropped.



##########
datafusion/expr/src/expr.rs:
##########
@@ -3673,7 +3670,7 @@ mod test {
     fn format_cast() -> Result<()> {
         let expr = Expr::Cast(Cast {
             expr: Box::new(Expr::Literal(ScalarValue::Float32(Some(1.23)), 
None)),
-            data_type: DataType::Utf8,
+            field: Arc::new(Field::new("cast", DataType::Utf8, false)),

Review Comment:
   ```suggestion
               field: DataType::Utf8.into_nullable_field_ref(),
   ```
   
   @alamb kindly added a helper for this (there are many places in this PR that 
would benefit!)
   
   An aside is that I think we should ensure that these fields are always 
unnamed for consistency (and that the names of such a field are never used).



##########
datafusion/proto/proto/datafusion.proto:
##########


Review Comment:
   I think many of these changes should be reverted (I don't think this file is 
generated?)



##########
datafusion/proto/proto/datafusion.proto:
##########
@@ -592,12 +540,12 @@ message WhenThen {
 
 message CastNode {
   LogicalExprNode expr = 1;
-  datafusion_common.ArrowType arrow_type = 2;
+  datafusion_common.Field field = 2;

Review Comment:
   It is also possible to add metadata + nullability here to keep the protobuf 
backward compatible (not a requirement but is relatively easy to do)



##########
datafusion/expr/src/expr_fn.rs:
##########
@@ -326,14 +326,75 @@ pub fn rollup(exprs: Vec<Expr>) -> Expr {
     Expr::GroupingSet(GroupingSet::Rollup(exprs))
 }
 
+/// Types that can be used to describe the result of a cast expression.
+pub trait IntoCastField {
+    fn into_cast_field(self, expr: &Expr) -> FieldRef;
+}
+
+impl IntoCastField for FieldRef {
+    fn into_cast_field(self, _expr: &Expr) -> FieldRef {
+        self
+    }
+}
+
+impl IntoCastField for &FieldRef {
+    fn into_cast_field(self, _expr: &Expr) -> FieldRef {
+        Arc::clone(self)
+    }
+}
+
+impl IntoCastField for Field {
+    fn into_cast_field(self, _expr: &Expr) -> FieldRef {
+        Arc::new(self)
+    }
+}
+
+impl IntoCastField for &Field {
+    fn into_cast_field(self, _expr: &Expr) -> FieldRef {
+        Arc::new(self.clone())
+    }
+}
+
+impl IntoCastField for DataType {
+    fn into_cast_field(self, expr: &Expr) -> FieldRef {
+        let nullable = infer_cast_nullability(expr);
+        Arc::new(Field::new("", self, nullable))
+    }
+}
+
+fn infer_cast_nullability(expr: &Expr) -> bool {
+    match expr {
+        Expr::Literal(value, _) => value.is_null(),
+        Expr::Cast(Cast { field, .. }) | Expr::TryCast(TryCast { field, .. }) 
=> {
+            field.is_nullable()
+        }
+        _ => true,
+    }
+}
+
 /// Create a cast expression
-pub fn cast(expr: Expr, data_type: DataType) -> Expr {
-    Expr::Cast(Cast::new(Box::new(expr), data_type))
+pub fn cast<F>(expr: Expr, field: F) -> Expr
+where
+    F: IntoCastField,
+{
+    let field = field.into_cast_field(&expr);
+    Expr::Cast(Cast::new(Box::new(expr), field))

Review Comment:
   I see that a number of references to `cast()` were updated as 
`cast(Arc::new(Field::new(...)))`...I wonder if those can be reverted (in 
theory `IntoCastField` should handle the conversion?)



##########
datafusion/expr/src/expr_fn.rs:
##########
@@ -326,14 +326,75 @@ pub fn rollup(exprs: Vec<Expr>) -> Expr {
     Expr::GroupingSet(GroupingSet::Rollup(exprs))
 }
 
+/// Types that can be used to describe the result of a cast expression.
+pub trait IntoCastField {
+    fn into_cast_field(self, expr: &Expr) -> FieldRef;
+}

Review Comment:
   These are somewhat similar to the `FieldExt` traits that Andrew added to 
datafusion_common/src/datatypes.rs. I think DataFusion would benefit if it had 
a way to say "get me a FieldRef that is actually a data type", but I am also 
not sure this PR is the right place to do that.



##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -579,10 +580,13 @@ impl ExprSchemable for Expr {
                 func.return_field_from_args(args)
             }
             // _ => Ok((self.get_type(schema)?, self.nullable(schema)?)),
-            Expr::Cast(Cast { expr, data_type }) => expr
-                .to_field(schema)
-                .map(|(_, f)| 
f.as_ref().clone().with_data_type(data_type.clone()))
-                .map(Arc::new),
+            Expr::Cast(Cast { expr, field }) | Expr::TryCast(TryCast { expr, 
field }) => {
+                let (_, input_field) = expr.to_field(schema)?;
+                let mut combined_metadata = 
FieldMetadata::from(input_field.metadata());
+                
combined_metadata.extend(FieldMetadata::from(field.metadata()));
+                let field = 
combined_metadata.add_to_field(field.as_ref().clone());
+                Ok(Arc::new(field))

Review Comment:
   I would prefer not to combine metadata here...this would mean that a `Cast` 
for an extension type would basically be a "reinterpret cast" not a "logical 
cast", which is what I believe the `Cast` is trying to represent in a logical 
plan. The `Alias` can currently add metadata in this way which I think is a bit 
fishy but is at least not attempting to represent a logical operation.



##########
datafusion/expr/src/expr.rs:
##########
@@ -794,14 +794,14 @@ pub enum GetFieldAccess {
 pub struct Cast {
     /// The expression being cast
     pub expr: Box<Expr>,
-    /// The `DataType` the expression will yield
-    pub data_type: DataType,
+    /// Field describing the result of the cast, including metadata
+    pub field: FieldRef,
 }
 
 impl Cast {
     /// Create a new Cast expression
-    pub fn new(expr: Box<Expr>, data_type: DataType) -> Self {
-        Self { expr, data_type }
+    pub fn new(expr: Box<Expr>, field: FieldRef) -> Self {
+        Self { expr, field }

Review Comment:
   This change (and the corresponding change for `TryCast::new()` seem to have 
had caused significant disruption...it is probably worth keeping the original 
signature and adding a new function specifically for casts with metadata and/or 
nullability to minimize the disruption in DataFusion and downstream.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to