In the C++ library at least, uniqueness is never asserted when reading
and writing the IPC metadata [1] [2]. If you use
KeyValueMetadata::FindKey and the keys are non-unique, it will return
the first one it finds. KeyValueMetadata::Merge assumes uniqueness,
and the KeyValueMetadata::ToUnorderedMap function will drop all but
one duplicate.

In Parquet, the metadata is also a list of KeyValue pairs with no
qualifications [3]

My weak preference is to leave it to applications to make assertions
about uniqueness. In either case since the metadata is ordered in the
integration tests it would make sense to serialize as a list of
key/value pairs like {"key": $key, "value": $value}

[1]: 
https://github.com/apache/arrow/blob/apache-arrow-0.16.0/cpp/src/arrow/ipc/metadata_internal.cc#L463
[2]: 
https://github.com/apache/arrow/blob/apache-arrow-0.16.0/cpp/src/arrow/ipc/metadata_internal.cc#L471
[3]: 
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L728

On Wed, Mar 11, 2020 at 12:11 PM Ben Kietzman <ben.kietz...@rstudio.com> wrote:
>
> While working on https://issues.apache.org/jira/browse/ARROW-2255
> (serialize custom_metadata in the integration tests), we had the following
> discussion on GitHub:
> https://github.com/apache/arrow/pull/6556#pullrequestreview-372405940
>
> In short, although in Schema.fbs custom_metadata is declared as an array of
> KeyValue pairs (so duplicate keys would be possible), all reference
> implementations assume it to represent an associative map with unique keys.
>
> Is there a use case for duplicate metadata keys? It seems that an
> acceptable resolution might be to note in Schema.fbs that implementations
> are allowed to assume that keys are unique
>
> Ben

Reply via email to