Hi Micah,

Parquet-MR does not have its own data model (except an example
implementation used for unit tests). So it is up to the data model how the
values are handled. I think it is possible to store key-value pairs with
the same key using the example implementation but there are no such tests.
I am not sure how the other bindings (arrow, avro, thrift, protobuf, pig)
work.

I think the specification is written this way to cover how parquet-mr works
because there are no checks implemented to store/read only one key-value
pair for a key. So technically it is possible to use Parquet MAP as a
multi-map but it might not be interoperable with other implementations or
models in parquet-mr.

Regards,
Gabor

On Mon, Oct 25, 2021 at 9:07 PM Micah Kornfield <[email protected]>
wrote:

> Hi dev@parquet,
> The Logical Type Specification [1] has the following to say about duplicate
> keys.
>
> If there are multiple key-value pairs for the same key, then the final
> > value for that key must be the last value. Other values may be ignored or
> > may be added with replacement to the map container in the order that they
> > are encoded. The MAP annotation should not be used to encode multi-maps
> > using duplicate keys.
>
>
> I was wondering if anybody was aware of systems that use this in practice
> (i.e. write out duplicate keys and rely on the reader to deduplicate them).
>
> Thanks,
> Micah
>
>
> [1]
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps
>

Reply via email to