Greetings, avro devs. We've been using avro for a short while now and have run into an issue with validation. Our problem is that we have a number of schemas that are quite large. When working on getting data into the right shape for them, the format of error messages for these large schemas has been pretty unhelpful.
In version 1.9.2, the error that is produced for validation errors shows the full structure of the expected schema as well as the entire datum provided at the top level of validation. For large schemas, this is of little value, since the part of the schema that is in error is likely to be one field somewhere in that pile of data. In order to solve this problem locally, we've created an alternate form of validation that uses iteration and traversal to validate each node. If any node fails validation, then the error raised contains that specific node (datum and schema) which improves the visibility of problems. I have noticed that in 1.10 this has been solved to some extent by adding the module constants _DEBUG_VALIDATE and _DEBUG_VALIDATE_INDENT. But it seems pretty clear that this is intended primarily for development. It doesn't really help at runtime. There's another potential advantage to our approach. As an iterative process, it will use fewer system resources, especially when validating schemas with a number of nested levels. I wanted to offer this new approach as a potential improvement and I am seeking to open a discussion of our code. I've got a working branch and am happy to open a PR against the apache github master if there's any chance of anyone being interested. Thanks very much for reading this far. I hope you might be interested. Yours, Cris Ewing Coffee Meets Bagel Engineering
