matthewgapp commented on code in PR #5197:
URL: https://github.com/apache/arrow-rs/pull/5197#discussion_r1459224462
##########
arrow-json/src/writer.rs:
##########
@@ -469,11 +469,67 @@ fn set_column_for_json_rows(
row.insert(col_name.to_string(),
serde_json::Value::Object(obj));
}
}
+ DataType::Decimal128(_precision, _scale) |
DataType::Decimal256(_precision, _scale) => {
+ to_json_float(rows, array, col_name, explicit_nulls)?;
+ }
_ => {
return Err(ArrowError::JsonError(format!(
- "data type {:?} not supported in nested map for json writer",
+ "data type {:?} not supported for json writer",
array.data_type()
- )))
+ )));
+ }
+ }
+ Ok(())
+}
+
+fn to_json_float(
+ rows: &mut [Option<JsonMap<String, Value>>],
+ array: &ArrayRef,
+ col_name: &str,
+ explicit_nulls: bool,
+) -> Result<(), ArrowError> {
+ let options = FormatOptions::default();
+ let formatter = ArrayFormatter::try_new(array.as_ref(), &options)?;
Review Comment:
I'd argue that representing decimals as strings makes for a bad JSON
consumption experience, especially when decimals are used for currency related
fields and not necessarily for very large numbers (often just two decimal
places).
Representing decimals as strings presents a large burden on the consumer of
the API:
1. the JSON consumer expects decimals to be json numbers, not strings
2. the JSON consumer must know which fields in the JSON are decimal (and
therefore string) if they want to parse them as a number or they must try-parse
every field as a number because some numbers might be represented as a string.
For example, imagine a parquet file that contains a bunch of financial data
and other attributes on home sales. Some of the financial data is stored as
decimals but I don't know which because I'm consuming the parquet file after
arrow_json has written to JSON. It's then my responsibility to try parse every
field as a number in my client language because we chose to represent them as
strings.
As a workaround, I suggest that we provide the arrow_json user the option to
write decimals to string or to parse to a JSON number.
We could do this through a compile time feature flag or we could expose the
option on the public API. Perhaps we expose some sort of json write struct that
uses builder style for options like this
```rust
/// lossless can either be the default or opt-in
json_rows =
JsonWriter::new(record_batches).with_lossless_decimal(true).write()
```
Alternatively, we could expand the current API signature
```rust
/// Converts an arrow [`RecordBatch`] into a `Vec` of Serde JSON
/// [`JsonMap`]s (objects)
pub fn record_batches_to_json_rows(
batches: &[&RecordBatch],
lossless_decimal: boolean
) -> Result<Vec<JsonMap<String, Value>>, ArrowError> {
//...
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]