bionicles commented on issue #6522:
URL: https://github.com/apache/arrow-rs/issues/6522#issuecomment-2629787406

   I prototyped this last month for polars, could share, it's a lot, one big 
issue though is the struct field isn't suited for json, because struct needs a 
schema and assumes json documents are homogenous. 
   
   Offsets arrays don't make sense for deeply nested paths. If one has flat 
mappings with homogeneous fields, then structs make sense, or flat lists of 
homogenous value type, that works with list type
   
   However, for arbitrary json, like mappings with heterogenous keys, nested 
lists or list values in mappings, or heterogeneous flat leaf values with no 
keys, there are many edge cases. 
   
   To make robust support for json in arrow, the best datatype to build on is 
string. 
   
   Alas, the normal string type does not cut it, because we need to know from 
schemas when a string array is one of normal text and when it is an array of 
json strings. If both normal text and json are "string" then the user needs to 
keep a separate schema outside the one from arrow. That might work for one's 
own codebase, but not for someone else's. 
   
   Therefore I suggest adding a new datatype to Arrow which is identical to 
string datatype except it is named "json" to facilitate different handling of 
that kind of string (with serde)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to