mwylde commented on issue #7845:
URL: 
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1922454327

   Our immediate concern (which motivated our json extension type and the 
changes in https://github.com/ArroyoSystems/arrow-rs/tree/49.0.0/json) is being 
able to support partial deserialization/serialization of json. 
   
   The arrow typesystem is necessarily limited compared to the schemas enabled 
by, for example, json-schema, and for a system like ours it's important to be 
able to handle real-world json with arbitrarily complex schemas.
   
   We do that by taking json schema and attempting to convert it into arrow as 
far as that's possible; when we encounter a subschema that isn't representable 
as arrow, we use utf8 with the json extension type. Then, with those arrow-json 
changes we're able to partially deserialize/serialize the arrow-schema fields, 
while leaving the unsupported fields as string-encoded json.
   
   Similarly, for tables defined via SQL DDL, we support a `JSON` type that has 
the same behavior.
   
   We would be interested in more native json support, particularly if we could 
parse the json once and store it in some form that enables more efficient json 
functions (which seems to be the direction that @thinkharderdev is going in 
with the tape representation).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to