alamb opened a new issue, #2874:
URL: https://github.com/apache/arrow-datafusion/issues/2874
**Describe the bug**
Various parts of the DataFusion codebase assume that the transformation
between `ScalarValue` <--> `Array` have the same datatype. This would seem to
be a reasonable assumption, however it does not hold for at least for
`DictionaryArrays`
For example, a `ScalarVaule` that is converted to an array, `cast`ed to a
`DictionaryArray<_>` due to coertion rules, and then converted back to a
`ScalarVaule`. When that supposedly cast `ScalarValue` is converted back to an
Array, it does not maintain its Dictionary encoding, instead it results in a
DataType::Utf8
**To Reproduce**
```rust
fn bad_cast() {
// here is a problem with round trip casting to/from a dictionary
// array. It is desired to cast this ScalarValue to a Dictionary
// (for coertion, for example)
let scalar = ScalarValue::Utf8(Some("foo".to_string()));
let desired_type = DataType::Dictionary(
// key type
Box::new(DataType::Int32),
// value type
Box::new(DataType::UInt8)
);
// convert from scalar --> Array to call cast
let scalar_array = scalar.to_array();
// cast the actual value
let cast_array = kernels::cast::cast(&scalar_array,
&desired_type).unwrap();
// turn it back to a scalar
let cast_scalar = ScalarValue::try_from_array(&cast_array, 0).unwrap();
// Some time later the "cast" scalar is turned back into an array:
let array = cast_scalar.to_array_of_size(10);
// The datatype should be "Dictionary" but is actually Utf8!!!
assert_eq!(array.data_type(), &desired_type)
}
```
Running this function results in
```
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `UInt8`,
right: `Dictionary(Int32, UInt8)`', src/main.rs:76:5
```
**Expected behavior**
Test case should pass
**Additional context**
I am not sure if it makes sense to add a `ScalarValue::Dictionary` type
variant, or perhaps add a `is_dictionary` flag or something else, or maybe even
just not assume a `ScalarValue` can be round tripped and maintain its data type
This is the root cause of
https://github.com/apache/arrow-datafusion/issues/2873 -- I added a patch for
that particular case but this problem can occur elsewhere
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]