kosiew commented on code in PR #20907:
URL: https://github.com/apache/datafusion/pull/20907#discussion_r2929282085
##########
datafusion/common/src/nested_struct.rs:
##########
@@ -1010,4 +1173,18 @@ mod tests {
assert!(b_col.is_null(0));
assert!(b_col.is_null(1));
}
+
Review Comment:
The new functionality adds fairly intricate `ListView` and dictionary
rebuild logic, but the new tests exercise only the `List<Struct>` execution
path and dictionary compatibility validation.
Do you think It would be worth adding tests for `Dictionary<_, Struct>` and
for `ListView<Struct>`?
##########
datafusion/common/src/nested_struct.rs:
##########
@@ -165,18 +170,139 @@ fn cast_struct_column(
/// - Invalid data type combinations are encountered
pub fn cast_column(
source_col: &ArrayRef,
- target_field: &Field,
+ target_type: &DataType,
cast_options: &CastOptions,
) -> Result<ArrayRef> {
- match target_field.data_type() {
- Struct(target_fields) => {
+ match (source_col.data_type(), target_type) {
+ (_, Struct(target_fields)) => {
Review Comment:
After this patch, runtime support exists for some nested wrappers around
structs, but planning is still using the old top-level check
` } else if matches!((&expr_type, &cast_type), (Struct(_), Struct(_))) {`
at datafusion/physical-expr/src/expressions/cast.rs:326
##########
datafusion/common/src/scalar/mod.rs:
##########
@@ -3905,15 +3905,11 @@ impl ScalarValue {
// and recursively casts nested structs. The field name wrapper is
arbitrary
// since cast_column only uses the DataType::Struct field definitions
inside.
let cast_arr = match target_type {
- DataType::Struct(_) => {
- // Field name is unused; only the struct's inner field names
matter
- let target_field = Field::new("_", target_type.clone(), true);
- crate::nested_struct::cast_column(
- &scalar_array,
- &target_field,
- cast_options,
- )?
- }
+ DataType::Struct(_) => crate::nested_struct::cast_column(
Review Comment:
This match DataType::Struct cast entry point and at
datafusion/expr-common/src/columnar_value.rs:317
mean casts such as `List<Struct{...}> -> List<Struct{..., extra}>` and
`Dictionary<_, Struct{...}> -> Dictionary<_, Struct{..., extra}>` still fall
back to Arrow's generic `cast_with_options`.
I think we should add a shared predicate like "does this target type require
name-based nested struct casting?" and reuse it in all three places.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]