In the C++ library at least, uniqueness is never asserted when reading and writing the IPC metadata [1] [2]. If you use KeyValueMetadata::FindKey and the keys are non-unique, it will return the first one it finds. KeyValueMetadata::Merge assumes uniqueness, and the KeyValueMetadata::ToUnorderedMap function will drop all but one duplicate.
In Parquet, the metadata is also a list of KeyValue pairs with no qualifications [3] My weak preference is to leave it to applications to make assertions about uniqueness. In either case since the metadata is ordered in the integration tests it would make sense to serialize as a list of key/value pairs like {"key": $key, "value": $value} [1]: https://github.com/apache/arrow/blob/apache-arrow-0.16.0/cpp/src/arrow/ipc/metadata_internal.cc#L463 [2]: https://github.com/apache/arrow/blob/apache-arrow-0.16.0/cpp/src/arrow/ipc/metadata_internal.cc#L471 [3]: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L728 On Wed, Mar 11, 2020 at 12:11 PM Ben Kietzman <ben.kietz...@rstudio.com> wrote: > > While working on https://issues.apache.org/jira/browse/ARROW-2255 > (serialize custom_metadata in the integration tests), we had the following > discussion on GitHub: > https://github.com/apache/arrow/pull/6556#pullrequestreview-372405940 > > In short, although in Schema.fbs custom_metadata is declared as an array of > KeyValue pairs (so duplicate keys would be possible), all reference > implementations assume it to represent an associative map with unique keys. > > Is there a use case for duplicate metadata keys? It seems that an > acceptable resolution might be to note in Schema.fbs that implementations > are allowed to assume that keys are unique > > Ben