An example would probably be the best to help clarify! This sentence is where it's interesting and confusing to me: "The deserialization was performed with the latest reader schema alongside the specific datum reader."
There's very few schema evolutions that will happily swap out schema without providing the _exact_ writer schema that was used, but it seems like you're aware of that! What are the hacks you're applying? If you think you've found a bug or a performance bottleneck with schema resolution, then please do raise a JIRA! Especially if you have a fix or ideas for improvements. All my best, Ryan On Thu, Jul 23, 2020 at 1:32 AM Andy Le <[email protected]> wrote: > > Hi Ryan, > > Would you please share your POC? > > Thank you. > > On 2020/07/22 22:54:04, Ryan Schachte <[email protected]> wrote: > > Hello everyone, > > We are facing an interesting use-case with respect to Avro and > > deserialization. > > > > As part of one of our systems, we get triggers to pull raw avro bytes out > > of our data layer and deserialize them. For many months, we have never had > > an issue with this. The deserialization was performed with the latest > > reader schema alongside the specific datum reader. > > > > Recently, a schema change within one of the relevant objects was updated > > and deemed backward-transitive from a registry perspective, however > > deserialization began to fail. Diving deeper into this issue, it was > > because the deserialization was explicitly casting fields based on the > > field-level ordering of the object at the root level. To further clarify, > > once we had compiled an adjacent object matching the avro schema, you can > > notice that the fields in some of the case statements rely on this > > ordering, which breaks our deserialization flow. > > > > To mitigate this issue, we have some hacks involving both the reader and > > writer schema in tandem to perform deserialization, but doing this > > operation on billions of records has destroyed a lot of our performance. > > > > My question is, how should we handle this situation on our end? I'm happy > > to further elaborate on the problem and provide examples as well. > > > > Thanks so much, > > Ryan Schachte > >
