Hi, Cris, I'm happy to take a look.
On Thu, Jul 23, 2020 at 20:23 Cris Ewing <[email protected]> wrote: > Greetings, avro devs. > > We've been using avro for a short while now and have run into an issue with > validation. Our problem is that we have a number of schemas that are quite > large. When working on getting data into the right shape for them, the > format of error messages for these large schemas has been pretty > unhelpful. > > In version 1.9.2, the error that is produced for validation errors shows > the full structure of the expected schema as well as the entire datum > provided at the top level of validation. For large schemas, this is of > little value, since the part of the schema that is in error is likely to be > one field somewhere in that pile of data. > > In order to solve this problem locally, we've created an alternate form of > validation that uses iteration and traversal to validate each node. If any > node fails validation, then the error raised contains that specific node > (datum and schema) which improves the visibility of problems. > > I have noticed that in 1.10 this has been solved to some extent by adding > the module constants _DEBUG_VALIDATE and _DEBUG_VALIDATE_INDENT. But it > seems pretty clear that this is intended primarily for development. It > doesn't really help at runtime. > > There's another potential advantage to our approach. As an iterative > process, it will use fewer system resources, especially when validating > schemas with a number of nested levels. > > I wanted to offer this new approach as a potential improvement and I am > seeking to open a discussion of our code. I've got a working branch and am > happy to open a PR against the apache github master if there's any chance > of anyone being interested. > > Thanks very much for reading this far. I hope you might be interested. > > Yours, > > Cris Ewing > Coffee Meets Bagel Engineering >
