progger-dev commented on code in PR #41257:
URL: https://github.com/apache/arrow/pull/41257#discussion_r1573871660
##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -251,6 +251,27 @@ Variable shape tensor
Values inside each **data** tensor element are stored in
row-major/C-contiguous
order according to the corresponding **shape**.
+.. _json_extension:
+
+JSON
+====
+
+* Extension name: `arrow.json`.
+
+* The storage type of this extension is ``StringArray`` or
+ or ``LargeStringArray`` or ``StringViewArray``.
+ Only UTF-8 encoded JSON is supported.
+
+* Extension type parameters:
+
+ This type does not have any parameters.
+
+* Description of the serialization:
+
+ Metadata is either an empty string or a JSON string with an empty object.
+ In the future, additional fields may be added, but they are not required
+ to interpret the array.
Review Comment:
I had 3 things in my mind for what I would like to be able to use the
metadata field for.
1. Parsing options that describe what features were used to generate the
JSON. E.g. trailing commas allowed, unquoted field names, etc.
2. Schemas
3. Specialized metadata used internally (e.g. BigQuery might want to store
what fields were columnarized, so we could reinstantiate it if the user exports
data and then reimports it back).
None of these were requirements, but just some potential things that we
might have used the metadata field for. I'm no longer at Google, so I'll tag
@emkornfield for more recent thoughts from the BQ team.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]