jagill opened a new pull request, #6643:
URL: https://github.com/apache/arrow-rs/pull/6643

   # Which issue does this PR close?
   
   Closes #6558.
   
   # Rationale for this change
    
   Currently, a StructArray can only be deserialized from a JSON object (e.g. 
`{a: 1, b: "c"}`), but some services (e.g. Presto and Trino) encode ROW types 
as JSON lists (e.g. `[1, "c"]`) because this is more compact, and the schema is 
known.
   Arrow-json cannot currently deserialize these.
   
   
   # What changes are included in this PR?
   
   This PR adds the ability to parse JSON lists into StructArrays, if the 
StructParseMode is set to ListOnly.  In ListOnly mode, object-encoded structs 
raise an error.  Setting to ObjectOnly (the default) has the original parsing 
behavior.
   
   # Are there any user-facing changes?
   
   Users may set the `StructParsingMode` enum to `ListOnly` to parse list-style 
structs.  The associated enum,
   variants, and method have been documented.  I'm happy to update any other 
documentation.
   
   # Discussion topics
   
   1. I've made a JsonParseMode struct instead of a bool flag for two reasons.  
One is that it's self-descriptive (what would `true` be?), and the other is 
that it allows a future Mixed mode that could deserialize either.  The latter 
isn't currently requested by anyone.
   2. I kept the error messages as similar to the old messages as possible. I 
considered having more specific error messages (like "Encountered a '[' when 
parsing a Struct, but the StructParseMode is ObjectOnly" or similar), but 
wanted to hear opinions before I went that route.
   3. I'm not attached to any name/code-style/etc, so happy to modify to fit 
local conventions.
   4. One requirement was that benchmarks do not regress.  My running of 
benchmarks have been inconclusive (see 
https://gist.github.com/jagill/6749248171a1f12fb7c653ff70c5ed42).  There are 
often small regressions or improvements in the single-digit % range whenever I 
switch between master and this PR.  I suspect they are statistical but I wanted 
to note these.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to