Re: Avro deserialization and field-level ordering

Ryan Schachte Thu, 23 Jul 2020 08:42:36 -0700

Hi guys, thanks for the quick reply. I will try and get an example POC out
as quickly as possible so we can further the discussion, thanks so much!


Cheers,
Ryan

On Thu, Jul 23, 2020 at 8:26 AM Ryan Skraba <[email protected]> wrote:

> An example would probably be the best to help clarify!
>
> This sentence is where it's interesting and confusing to me: "The
> deserialization was performed with the latest reader schema alongside
> the specific datum reader."
>
> There's very few schema evolutions that will happily swap out schema
> without providing the _exact_ writer schema that was used, but it
> seems like you're aware of that!  What are the hacks you're applying?
>
> If you think you've found a bug or a performance bottleneck with
> schema resolution, then please do raise a JIRA!  Especially if you
> have a fix or ideas for improvements.
>
> All my best, Ryan
>
> On Thu, Jul 23, 2020 at 1:32 AM Andy Le <[email protected]> wrote:
> >
> > Hi Ryan,
> >
> > Would you please share your POC?
> >
> > Thank you.
> >
> > On 2020/07/22 22:54:04, Ryan Schachte <[email protected]>
> wrote:
> > > Hello everyone,
> > > We are facing an interesting use-case with respect to Avro and
> > > deserialization.
> > >
> > > As part of one of our systems, we get triggers to pull raw avro bytes
> out
> > > of our data layer and deserialize them. For many months, we have never
> had
> > > an issue with this. The deserialization was performed with the latest
> > > reader schema alongside the specific datum reader.
> > >
> > > Recently, a schema change within one of the relevant objects was
> updated
> > > and deemed backward-transitive from a registry perspective, however
> > > deserialization began to fail. Diving deeper into this issue, it was
> > > because the deserialization was explicitly casting fields based on the
> > > field-level ordering of the object at the root level. To further
> clarify,
> > > once we had compiled an adjacent object matching the avro schema, you
> can
> > > notice that the fields in some of the case statements rely on this
> > > ordering, which breaks our deserialization flow.
> > >
> > > To mitigate this issue, we have some hacks involving both the reader
> and
> > > writer schema in tandem to perform deserialization, but doing this
> > > operation on billions of records has destroyed a lot of our
> performance.
> > >
> > > My question is, how should we handle this situation on our end? I'm
> happy
> > > to further elaborate on the problem and provide examples as well.
> > >
> > > Thanks so much,
> > > Ryan Schachte
> > >
>

Re: Avro deserialization and field-level ordering

Reply via email to