paleolimbot commented on code in PR #23169:
URL: https://github.com/apache/datafusion/pull/23169#discussion_r3509321542


##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -69,18 +69,43 @@ pub trait ExprSchemable {
     -> Result<(DataType, bool)>;
 }
 
-/// Derives the output field for a cast expression from the source field.
+/// Derives the output field for a cast expression from the source and target 
fields.
+///
+/// Metadata handling:
+/// - Source field metadata is propagated by default
+/// - Extension type metadata keys (`ARROW:extension:name` and 
`ARROW:extension:metadata`)
+///   are taken from the target field, overwriting any extension metadata from 
the source.
+///   This ensures casting does not incorrectly propagate extension type 
identity.
+///
 /// For `TryCast`, `force_nullable` is `true` since a failed cast returns NULL.
 fn cast_output_field(
     source_field: &FieldRef,
-    target_type: &DataType,
+    target_field: &FieldRef,
     force_nullable: bool,
 ) -> Arc<Field> {
+    use arrow_schema::extension::{EXTENSION_TYPE_METADATA_KEY, 
EXTENSION_TYPE_NAME_KEY};
+
+    // Start with source metadata
+    let mut metadata = source_field.metadata().clone();
+
+    // Remove any extension type metadata from source - these should not 
propagate through casts
+    metadata.remove(EXTENSION_TYPE_NAME_KEY);
+    metadata.remove(EXTENSION_TYPE_METADATA_KEY);
+
+    // Add extension type metadata from the target field if present
+    let target_metadata = target_field.metadata();
+    if let Some(name) = target_metadata.get(EXTENSION_TYPE_NAME_KEY) {
+        metadata.insert(EXTENSION_TYPE_NAME_KEY.to_string(), name.clone());
+    }
+    if let Some(ext_meta) = target_metadata.get(EXTENSION_TYPE_METADATA_KEY) {
+        metadata.insert(EXTENSION_TYPE_METADATA_KEY.to_string(), 
ext_meta.clone());
+    }
+

Review Comment:
   I updated this to make a bit more sense hopefully...the previous behaviour 
(pass all metadata through because casts are just modifying the DataType) 
stays, except we strip extension metadata so that `::VARCHAR` has an output 
type of `Utf8`. If any metadata is specified, it is authoritative.
   
   Unfortunately updating the storage like I did for the physical cast would be 
a breaking change here (we maybe shouldn't have used a `FieldRef` 😬 ), so "the 
previous behaviour" is a bit cryptically detected (mind you, this cryptic 
detection previously existed in the physical operator...I just moved it here).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to