Hello!  You're correct that there is no guidance in the spec, and that
conceptually an Avro map could be represented as an ordered list of
(key, value) tuples.

It's only an opinion, but I think the *best practice* for
interoperability would be to avoid serializing duplicate keys.
Whoever is deserializing a binary with a map and multiple, non-unique
keys must make a choice about which to keep (or keep them all, or
throw an error), and there's really no way to predict what it will end
up with given the spec.

I'd be tempted to say "keep last" should be the rule and added to the
spec for this case... But I don't really have a very good
justification!

It's an interesting question, because other than the non-unique key
problem, there's no reason that you couldn't represent the map as you
suggest.

Ryan




On Fri, Apr 21, 2023 at 5:52 PM Jack Klamer
<[email protected]> wrote:
>
> Hello Avro Devs,
>
> I am looking for clarification in the spec as I am working on a
> particular implementation for Avro data reading. Most Avro implementations
> serialize/deserialize maps using language specific maps that guarantee one
> value per key is written and the last value per key read is returned. There
> however is no guidance in the spec on whether that is
> necessary/implied/expected.
>
> For my use case, I am deserializing keys and values into lists of each,
> without checking for uniqueness, and want to know if this breaks the spec,
> or if those who serialize maps without unique keys (presumably outside the
> bound of most language implementations) can expect undefined behavior when
> reading?
>
> --
> Jack Klamer
>
> Software Engineer
>
> *he/him*
> [email protected]
> <https://www.starburst.io/>

Reply via email to