jagill opened a new issue, #6558:
URL: https://github.com/apache/arrow-rs/issues/6558

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Currently, to deserialize a StructArray from JSON, you need to use a JSON 
Object.  E.g., deserializing a Struct<a: i32, b: string> would need something 
like `{a: 1, b: "c"}`.  This is also true of top-level RecordBatches.  Some 
services, such as Presto and Trino, serialize ROW fields as lists.  The example 
above would be serialized as `[1, "c"]`.  If you already know the schema, this 
is a more compact representation that reduces the data on the wire.
   
   I would like the ability for arrow_json to deserialize these list-encoded 
structs and record batches, perhaps under an option flag.
   
   **Describe the solution you'd like**
   
   When a StructArrayDecoder encounters a `[` (around 
[here](https://github.com/apache/arrow-rs/blob/master/arrow-json/src/reader/struct_array.rs#L75)),
 it switches to a parsing mode that does not look for field names, and requires 
a closing `]` for completion.  It should return a parsing error if either the 
number of entries of the list is not the same as the number of fields in the 
struct, or if any of the sub-parsers encounter the wrong type.  This requires 
the fields of the struct to be in the correct parsing order, while the current 
object parsing can shuffle the fields if they appear in a different order.
   
   **Describe alternatives you've considered**
   
   I currently parse the results with serde_json, then recursively run down the 
JSON to convert Lists to Objects using the schema.  Then I re-serialize the 
top-level JSON, then read it using arrow_json.  This is not very efficient.
   
   Alternatively, I could reproduce a less-good copy of the version of 
arrow_json that deserialized using serde_json.  Either making serde_json 
decoders directly, or taking a serde_json::Value and populate the ArrayBuilders 
myself.  This is a lot of code duplication.
   
   **Additional context**
   
   I don't personally need the ability to serialize a StructArray/RecordBatch 
into a List, although that would seem symmetrical.
   
   I am happy to make an RFC PR implementing this functionality.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to