kosiew commented on code in PR #20907:
URL: https://github.com/apache/datafusion/pull/20907#discussion_r2929282085


##########
datafusion/common/src/nested_struct.rs:
##########
@@ -1010,4 +1173,18 @@ mod tests {
         assert!(b_col.is_null(0));
         assert!(b_col.is_null(1));
     }
+

Review Comment:
   The new functionality adds fairly intricate `ListView` and dictionary 
rebuild logic, but the new tests exercise only the `List<Struct>` execution 
path and dictionary compatibility validation. 
   
   Do you think It would be worth adding tests for `Dictionary<_, Struct>` and 
for `ListView<Struct>`?



##########
datafusion/common/src/nested_struct.rs:
##########
@@ -165,18 +170,139 @@ fn cast_struct_column(
 /// - Invalid data type combinations are encountered
 pub fn cast_column(
     source_col: &ArrayRef,
-    target_field: &Field,
+    target_type: &DataType,
     cast_options: &CastOptions,
 ) -> Result<ArrayRef> {
-    match target_field.data_type() {
-        Struct(target_fields) => {
+    match (source_col.data_type(), target_type) {
+        (_, Struct(target_fields)) => {

Review Comment:
   After this patch, runtime support exists for some nested wrappers around 
structs, but planning is still using the old top-level check 
   `    } else if matches!((&expr_type, &cast_type), (Struct(_), Struct(_))) {`
   at datafusion/physical-expr/src/expressions/cast.rs:326



##########
datafusion/common/src/scalar/mod.rs:
##########
@@ -3905,15 +3905,11 @@ impl ScalarValue {
         // and recursively casts nested structs. The field name wrapper is 
arbitrary
         // since cast_column only uses the DataType::Struct field definitions 
inside.
         let cast_arr = match target_type {
-            DataType::Struct(_) => {
-                // Field name is unused; only the struct's inner field names 
matter
-                let target_field = Field::new("_", target_type.clone(), true);
-                crate::nested_struct::cast_column(
-                    &scalar_array,
-                    &target_field,
-                    cast_options,
-                )?
-            }
+            DataType::Struct(_) => crate::nested_struct::cast_column(

Review Comment:
   This match DataType::Struct  cast entry point and at 
   datafusion/expr-common/src/columnar_value.rs:317
   mean casts such as `List<Struct{...}> -> List<Struct{..., extra}>` and 
`Dictionary<_, Struct{...}> -> Dictionary<_, Struct{..., extra}>` still fall 
back to Arrow's generic `cast_with_options`.
   
   I think we should add a shared predicate like "does this target type require 
name-based nested struct casting?" and reuse it in all three places.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to