jagill opened a new issue, #6558:
URL: https://github.com/apache/arrow-rs/issues/6558
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
Currently, to deserialize a StructArray from JSON, you need to use a JSON
Object. E.g., deserializing a Struct<a: i32, b: string> would need something
like `{a: 1, b: "c"}`. This is also true of top-level RecordBatches. Some
services, such as Presto and Trino, serialize ROW fields as lists. The example
above would be serialized as `[1, "c"]`. If you already know the schema, this
is a more compact representation that reduces the data on the wire.
I would like the ability for arrow_json to deserialize these list-encoded
structs and record batches, perhaps under an option flag.
**Describe the solution you'd like**
When a StructArrayDecoder encounters a `[` (around
[here](https://github.com/apache/arrow-rs/blob/master/arrow-json/src/reader/struct_array.rs#L75)),
it switches to a parsing mode that does not look for field names, and requires
a closing `]` for completion. It should return a parsing error if either the
number of entries of the list is not the same as the number of fields in the
struct, or if any of the sub-parsers encounter the wrong type. This requires
the fields of the struct to be in the correct parsing order, while the current
object parsing can shuffle the fields if they appear in a different order.
**Describe alternatives you've considered**
I currently parse the results with serde_json, then recursively run down the
JSON to convert Lists to Objects using the schema. Then I re-serialize the
top-level JSON, then read it using arrow_json. This is not very efficient.
Alternatively, I could reproduce a less-good copy of the version of
arrow_json that deserialized using serde_json. Either making serde_json
decoders directly, or taking a serde_json::Value and populate the ArrayBuilders
myself. This is a lot of code duplication.
**Additional context**
I don't personally need the ability to serialize a StructArray/RecordBatch
into a List, although that would seem symmetrical.
I am happy to make an RFC PR implementing this functionality.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]