Avro deserialization and field-level ordering

Ryan Schachte Wed, 22 Jul 2020 15:55:13 -0700

Hello everyone,
We are facing an interesting use-case with respect to Avro and
deserialization.


As part of one of our systems, we get triggers to pull raw avro bytes out
of our data layer and deserialize them. For many months, we have never had
an issue with this. The deserialization was performed with the latest
reader schema alongside the specific datum reader.

Recently, a schema change within one of the relevant objects was updated
and deemed backward-transitive from a registry perspective, however
deserialization began to fail. Diving deeper into this issue, it was
because the deserialization was explicitly casting fields based on the
field-level ordering of the object at the root level. To further clarify,
once we had compiled an adjacent object matching the avro schema, you can
notice that the fields in some of the case statements rely on this
ordering, which breaks our deserialization flow.

To mitigate this issue, we have some hacks involving both the reader and
writer schema in tandem to perform deserialization, but doing this
operation on billions of records has destroyed a lot of our performance.

My question is, how should we handle this situation on our end? I'm happy
to further elaborate on the problem and provide examples as well.

Thanks so much,
Ryan Schachte

Avro deserialization and field-level ordering

Reply via email to