Hello everyone, We are facing an interesting use-case with respect to Avro and deserialization.
As part of one of our systems, we get triggers to pull raw avro bytes out of our data layer and deserialize them. For many months, we have never had an issue with this. The deserialization was performed with the latest reader schema alongside the specific datum reader. Recently, a schema change within one of the relevant objects was updated and deemed backward-transitive from a registry perspective, however deserialization began to fail. Diving deeper into this issue, it was because the deserialization was explicitly casting fields based on the field-level ordering of the object at the root level. To further clarify, once we had compiled an adjacent object matching the avro schema, you can notice that the fields in some of the case statements rely on this ordering, which breaks our deserialization flow. To mitigate this issue, we have some hacks involving both the reader and writer schema in tandem to perform deserialization, but doing this operation on billions of records has destroyed a lot of our performance. My question is, how should we handle this situation on our end? I'm happy to further elaborate on the problem and provide examples as well. Thanks so much, Ryan Schachte
