jonmmease opened a new issue, #5095:
URL: https://github.com/apache/arrow-rs/issues/5095
**Describe the bug**
It looks like the `coerce_primitive` configuration option on `ReaderBuilder`
is not honored when decoding from serde-compatible objects like
`serde_json::Value`.
**To Reproduce**
Code to add a test to arrow-json/src/reader/mod.rs that triggers the error
```rust
#[test]
fn test_coercing_primitive_into_string_decoder() {
let buf = r#"[
{"a": 1, "b": "A", "c": "T"},
{"a": 2, "b": "BB", "c": "F"},
{"a": 3, "b": 123, "c": false}
]"#;
let schema = Schema::new(vec![
Field::new("a", DataType::Float64, true),
Field::new("b", DataType::Utf8, true),
Field::new("c", DataType::Utf8, true),
]);
let json_array: Vec<serde_json::Value> =
serde_json::from_str(buf).unwrap();
let schema_ref = Arc::new(schema);
// read record batches
let reader =
ReaderBuilder::new(schema_ref.clone()).with_coerce_primitive(true);
let mut decoder = reader.build_decoder().unwrap();
decoder.serialize(json_array.as_slice()).unwrap();
let batch = decoder.flush().unwrap().unwrap();
println!("{:?}", batch);
}
```
```
called `Result::unwrap()` on an `Err` value: JsonError("whilst decoding
field 'b': expected string got 123")
thread 'reader::tests::test_coercing_primitive_into_string_decoder' panicked
at 'called `Result::unwrap()` on an `Err` value: JsonError("whilst decoding
field 'b': expected string got 123")', arrow-json/src/reader/mod.rs:2284:37
stack backtrace:
0: rust_begin_unwind
at
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
1: core::panicking::panic_fmt
at
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
2: core::result::unwrap_failed
at
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/result.rs:1651:5
3: core::result::Result<T,E>::unwrap
at
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/result.rs:1076:23
4: arrow_json::reader::tests::test_coercing_primitive_into_string_decoder
at ./src/reader/mod.rs:2284:21
5:
arrow_json::reader::tests::test_coercing_primitive_into_string_decoder::{{closure}}
at ./src/reader/mod.rs:2264:54
6: core::ops::function::FnOnce::call_once
at
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5
7: core::ops::function::FnOnce::call_once
at
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
backtrace.
```
**Expected behavior**
I expect the value `123` in column `b` to be converted to a string.
**Additional context**
This worked in version 47. I hit the error when updating to DataFusion
33.0.0 with Arrow 48.0.1, but have also confirmed it on `main`.
My hunch is that this was introduced with
https://github.com/apache/arrow-rs/pull/4861, and that some additional logic is
needed around here
https://github.com/apache/arrow-rs/blob/dc75a280b46149140eca8dd5e18d31cbadf04716/arrow-json/src/reader/string_array.rs#L45-L66
to handle coercing the new TapeElement enumerations (`I64`, `I32`, `F64`,
and `F32`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]