ion-elgreco opened a new issue, #14218:
URL: https://github.com/apache/datafusion/issues/14218
### Describe the bug
I am rewriting our CDF operation in delta-rs, the code looks roughtly like
this:
```rust
let mut projected = if should_cdc {
operation_count
.clone()
.with_column(
CDC_COLUMN_NAME,
when(col(TARGET_DELETE_COLUMN).is_null(), lit("delete")) //
nulls are equal to True
.when(col(DELETE_COLUMN).is_null(), lit("source_delete"))
.when(col(TARGET_COPY_COLUMN).is_null(), lit("copy"))
.when(col(TARGET_INSERT_COLUMN).is_null(), lit("insert"))
.when(col(TARGET_UPDATE_COLUMN).is_null(), lit("update"))
.end()?,
)?
// .drop_columns(&["__delta_rs_path"])? // WEIRD bug caused by
interaction with unnest_columns, has to be dropped otherwise throws schema error
.with_column(
"__delta_rs_update_expanded",
when(
col(CDC_COLUMN_NAME).eq(lit("update")),
lit(ScalarValue::List(ScalarValue::new_list(
&[
ScalarValue::Utf8(Some("update_preimage".into())),
ScalarValue::Utf8(Some("update_postimage".into())),
],
&DataType::List(Field::new("element",
DataType::Utf8, false).into()),
true,
))),
)
.end()?,
)?
.unnest_columns(&["__delta_rs_update_expanded"])?
.with_column(
CDC_COLUMN_NAME,
when(
col(CDC_COLUMN_NAME).eq(lit("update")),
col("__delta_rs_update_expanded"),
)
.otherwise(col(CDC_COLUMN_NAME))?,
)?
.drop_columns(&["__delta_rs_update_expanded"])?
.select(write_projection_with_cdf)?
```
I noticed that when I do unnest_columns on another column, it complains
afterwards about a schema error:
```
Result::unwrap()` on an `Err` value: Arrow { source:
InvalidArgumentError("column types must match schema types, expected Utf8 but
found Dictionary(UInt16, Utf8) at column index 7") }
```
Since I don't need the column, I can safely drop it beforehand, but I don't
understand why doesn't Dictionary(UInt16, Utf8) just coerce to utf8?
### To Reproduce
Bit difficult but, you can run grab my branch:
https://github.com/ion-elgreco/delta-rs/tree/refactor--combine_execution_plans
And then you run the test test_merge_cdc_enabled_simple, with this line
commented out: `.drop_columns(&["__delta_rs_path"])? `
### Expected behavior
I guess coerce gracefully?
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]