progger-dev commented on code in PR #41257:
URL: https://github.com/apache/arrow/pull/41257#discussion_r1573871660


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -251,6 +251,27 @@ Variable shape tensor
    Values inside each **data** tensor element are stored in 
row-major/C-contiguous
    order according to the corresponding **shape**.
 
+.. _json_extension:
+
+JSON
+====
+
+* Extension name: `arrow.json`.
+
+* The storage type of this extension is ``StringArray`` or
+  or ``LargeStringArray`` or ``StringViewArray``.
+  Only UTF-8 encoded JSON is supported.
+
+* Extension type parameters:
+
+  This type does not have any parameters.
+
+* Description of the serialization:
+
+  Metadata is either an empty string or a JSON string with an empty object.
+  In the future, additional fields may be added, but they are not required
+  to interpret the array.

Review Comment:
   I had 3 things in my mind for what I would like to be able to use the 
metadata field for.
   
   1. Parsing options that describe what features were used to generate the 
JSON. E.g. trailing commas allowed, unquoted field names, etc.
   2. Schemas
   3. Specialized metadata used internally (e.g. BigQuery might want to store 
what fields were columnarized, so we could reinstantiate it if the user exports 
data and then reimports it back).
   
   None of these were requirements, but just some potential things that we 
might have used the metadata field for. I'm no longer at Google, so I'll tag 
@emkornfield for more recent thoughts from the BQ team.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to