Re: An alternative approach to validation in Python?

Cris Ewing Fri, 24 Jul 2020 13:35:21 -0700

Great Mike!  Thanks.  Here's a draft PR in github:
https://github.com/apache/avro/pull/936


On Thu, Jul 23, 2020 at 7:44 PM Michael A. Smith <[email protected]>
wrote:

> Hi, Cris,
>
> I'm happy to take a look.
>
> On Thu, Jul 23, 2020 at 20:23 Cris Ewing <[email protected]>
> wrote:
>
> > Greetings, avro devs.
> >
> > We've been using avro for a short while now and have run into an issue
> with
> > validation.  Our problem is that we have a number of schemas that are
> quite
> > large.  When working on getting data into the right shape for them, the
> > format of error messages for these large schemas has been pretty
> > unhelpful.
> >
> > In version 1.9.2, the error that is produced for validation errors shows
> > the full structure of the expected schema as well as the entire datum
> > provided at the top level of validation.  For large schemas, this is of
> > little value, since the part of the schema that is in error is likely to
> be
> > one field somewhere in that pile of data.
> >
> > In order to solve this problem locally, we've created an alternate form
> of
> > validation that uses iteration and traversal to validate each node.  If
> any
> > node fails validation, then the error raised contains that specific node
> > (datum and schema) which improves the visibility of problems.
> >
> > I have noticed that in 1.10 this has been solved to some extent by adding
> > the module constants _DEBUG_VALIDATE and _DEBUG_VALIDATE_INDENT.  But it
> > seems pretty clear that this is intended primarily for development.  It
> > doesn't really help at runtime.
> >
> > There's another potential advantage to our approach.  As an iterative
> > process, it will use fewer system resources, especially when validating
> > schemas with a number of nested levels.
> >
> > I wanted to offer this new approach as a potential improvement and I am
> > seeking to open a discussion of our code.  I've got a working branch and
> am
> > happy to open a PR against the apache github master if there's any chance
> > of anyone being interested.
> >
> > Thanks very much for reading this far.  I hope you might be interested.
> >
> > Yours,
> >
> > Cris Ewing
> > Coffee Meets Bagel Engineering
> >
>

Re: An alternative approach to validation in Python?

Reply via email to