Hi Micah, Parquet-MR does not have its own data model (except an example implementation used for unit tests). So it is up to the data model how the values are handled. I think it is possible to store key-value pairs with the same key using the example implementation but there are no such tests. I am not sure how the other bindings (arrow, avro, thrift, protobuf, pig) work.
I think the specification is written this way to cover how parquet-mr works because there are no checks implemented to store/read only one key-value pair for a key. So technically it is possible to use Parquet MAP as a multi-map but it might not be interoperable with other implementations or models in parquet-mr. Regards, Gabor On Mon, Oct 25, 2021 at 9:07 PM Micah Kornfield <[email protected]> wrote: > Hi dev@parquet, > The Logical Type Specification [1] has the following to say about duplicate > keys. > > If there are multiple key-value pairs for the same key, then the final > > value for that key must be the last value. Other values may be ignored or > > may be added with replacement to the map container in the order that they > > are encoded. The MAP annotation should not be used to encode multi-maps > > using duplicate keys. > > > I was wondering if anybody was aware of systems that use this in practice > (i.e. write out duplicate keys and rely on the reader to deduplicate them). > > Thanks, > Micah > > > [1] > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps >
