Hi, Cris,

I'm happy to take a look.

On Thu, Jul 23, 2020 at 20:23 Cris Ewing <[email protected]> wrote:

> Greetings, avro devs.
>
> We've been using avro for a short while now and have run into an issue with
> validation.  Our problem is that we have a number of schemas that are quite
> large.  When working on getting data into the right shape for them, the
> format of error messages for these large schemas has been pretty
> unhelpful.
>
> In version 1.9.2, the error that is produced for validation errors shows
> the full structure of the expected schema as well as the entire datum
> provided at the top level of validation.  For large schemas, this is of
> little value, since the part of the schema that is in error is likely to be
> one field somewhere in that pile of data.
>
> In order to solve this problem locally, we've created an alternate form of
> validation that uses iteration and traversal to validate each node.  If any
> node fails validation, then the error raised contains that specific node
> (datum and schema) which improves the visibility of problems.
>
> I have noticed that in 1.10 this has been solved to some extent by adding
> the module constants _DEBUG_VALIDATE and _DEBUG_VALIDATE_INDENT.  But it
> seems pretty clear that this is intended primarily for development.  It
> doesn't really help at runtime.
>
> There's another potential advantage to our approach.  As an iterative
> process, it will use fewer system resources, especially when validating
> schemas with a number of nested levels.
>
> I wanted to offer this new approach as a potential improvement and I am
> seeking to open a discussion of our code.  I've got a working branch and am
> happy to open a PR against the apache github master if there's any chance
> of anyone being interested.
>
> Thanks very much for reading this far.  I hope you might be interested.
>
> Yours,
>
> Cris Ewing
> Coffee Meets Bagel Engineering
>

Reply via email to