Joris Van den Bossche created ARROW-15552:
---------------------------------------------
Summary: [Docs][Format] Unclear wording about base64 encoding
requirement of metadata values
Key: ARROW-15552
URL: https://issues.apache.org/jira/browse/ARROW-15552
Project: Apache Arrow
Issue Type: Improvement
Components: Documentation, Format
Reporter: Joris Van den Bossche
The C Data Interface docs indicate that the values in key-value metadata should
be base64 encoded, which is mentioned in the section about which key-value
metadata to use for extension types
(https://arrow.apache.org/docs/format/CDataInterface.html#extension-arrays):
bq. The base64 encoding of metadata values ensures that any possible
serialization is representable.
This might not be fully correct, though (or at least not required, which is
implied with the current wording). While a binary blob (like a serialized
schema) can be base64 encoded, as we do when putting the Arrow schema in the
Parquet metadata, this is not required?
cc [~apitrou]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)