Re: Avro deserialization and field-level ordering

Ryan Skraba Thu, 23 Jul 2020 08:27:40 -0700

An example would probably be the best to help clarify!

This sentence is where it's interesting and confusing to me: "The
deserialization was performed with the latest reader schema alongside
the specific datum reader."


There's very few schema evolutions that will happily swap out schema
without providing the _exact_ writer schema that was used, but it
seems like you're aware of that!  What are the hacks you're applying?

If you think you've found a bug or a performance bottleneck with
schema resolution, then please do raise a JIRA!  Especially if you
have a fix or ideas for improvements.

All my best, Ryan

On Thu, Jul 23, 2020 at 1:32 AM Andy Le <[email protected]> wrote:
>
> Hi Ryan,
>
> Would you please share your POC?
>
> Thank you.
>
> On 2020/07/22 22:54:04, Ryan Schachte <[email protected]> wrote:
> > Hello everyone,
> > We are facing an interesting use-case with respect to Avro and
> > deserialization.
> >
> > As part of one of our systems, we get triggers to pull raw avro bytes out
> > of our data layer and deserialize them. For many months, we have never had
> > an issue with this. The deserialization was performed with the latest
> > reader schema alongside the specific datum reader.
> >
> > Recently, a schema change within one of the relevant objects was updated
> > and deemed backward-transitive from a registry perspective, however
> > deserialization began to fail. Diving deeper into this issue, it was
> > because the deserialization was explicitly casting fields based on the
> > field-level ordering of the object at the root level. To further clarify,
> > once we had compiled an adjacent object matching the avro schema, you can
> > notice that the fields in some of the case statements rely on this
> > ordering, which breaks our deserialization flow.
> >
> > To mitigate this issue, we have some hacks involving both the reader and
> > writer schema in tandem to perform deserialization, but doing this
> > operation on billions of records has destroyed a lot of our performance.
> >
> > My question is, how should we handle this situation on our end? I'm happy
> > to further elaborate on the problem and provide examples as well.
> >
> > Thanks so much,
> > Ryan Schachte
> >

Re: Avro deserialization and field-level ordering

Reply via email to