Hi Ryan, Would you please share your POC?
Thank you. On 2020/07/22 22:54:04, Ryan Schachte <[email protected]> wrote: > Hello everyone, > We are facing an interesting use-case with respect to Avro and > deserialization. > > As part of one of our systems, we get triggers to pull raw avro bytes out > of our data layer and deserialize them. For many months, we have never had > an issue with this. The deserialization was performed with the latest > reader schema alongside the specific datum reader. > > Recently, a schema change within one of the relevant objects was updated > and deemed backward-transitive from a registry perspective, however > deserialization began to fail. Diving deeper into this issue, it was > because the deserialization was explicitly casting fields based on the > field-level ordering of the object at the root level. To further clarify, > once we had compiled an adjacent object matching the avro schema, you can > notice that the fields in some of the case statements rely on this > ordering, which breaks our deserialization flow. > > To mitigate this issue, we have some hacks involving both the reader and > writer schema in tandem to perform deserialization, but doing this > operation on billions of records has destroyed a lot of our performance. > > My question is, how should we handle this situation on our end? I'm happy > to further elaborate on the problem and provide examples as well. > > Thanks so much, > Ryan Schachte >
