Hi guys, thanks for the quick reply. I will try and get an example POC out as quickly as possible so we can further the discussion, thanks so much!
Cheers, Ryan On Thu, Jul 23, 2020 at 8:26 AM Ryan Skraba <[email protected]> wrote: > An example would probably be the best to help clarify! > > This sentence is where it's interesting and confusing to me: "The > deserialization was performed with the latest reader schema alongside > the specific datum reader." > > There's very few schema evolutions that will happily swap out schema > without providing the _exact_ writer schema that was used, but it > seems like you're aware of that! What are the hacks you're applying? > > If you think you've found a bug or a performance bottleneck with > schema resolution, then please do raise a JIRA! Especially if you > have a fix or ideas for improvements. > > All my best, Ryan > > On Thu, Jul 23, 2020 at 1:32 AM Andy Le <[email protected]> wrote: > > > > Hi Ryan, > > > > Would you please share your POC? > > > > Thank you. > > > > On 2020/07/22 22:54:04, Ryan Schachte <[email protected]> > wrote: > > > Hello everyone, > > > We are facing an interesting use-case with respect to Avro and > > > deserialization. > > > > > > As part of one of our systems, we get triggers to pull raw avro bytes > out > > > of our data layer and deserialize them. For many months, we have never > had > > > an issue with this. The deserialization was performed with the latest > > > reader schema alongside the specific datum reader. > > > > > > Recently, a schema change within one of the relevant objects was > updated > > > and deemed backward-transitive from a registry perspective, however > > > deserialization began to fail. Diving deeper into this issue, it was > > > because the deserialization was explicitly casting fields based on the > > > field-level ordering of the object at the root level. To further > clarify, > > > once we had compiled an adjacent object matching the avro schema, you > can > > > notice that the fields in some of the case statements rely on this > > > ordering, which breaks our deserialization flow. > > > > > > To mitigate this issue, we have some hacks involving both the reader > and > > > writer schema in tandem to perform deserialization, but doing this > > > operation on billions of records has destroyed a lot of our > performance. > > > > > > My question is, how should we handle this situation on our end? I'm > happy > > > to further elaborate on the problem and provide examples as well. > > > > > > Thanks so much, > > > Ryan Schachte > > > >
