[
https://issues.apache.org/jira/browse/AVRO-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180823#comment-17180823
]
ASF subversion and git services commented on AVRO-2906:
-------------------------------------------------------
Commit efb12314b5acfea075a533368441c0f5a3b844d4 in avro's branch
refs/heads/master from Cris Ewing
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=efb1231 ]
AVRO-2906: Traversal validation (#936)
* AVRO-2906: Convert validation to a traversal-based approach
Use schema-type specific iterators and validators to allow a
breadth-first traversal of a full schema, validating each node
as you go.
The benefit of this approach is that it allows us to pin-point
the specific part of the schema that has failed validation.
Where previously the error message for a large schema would print
the entire datum as well as the full schema and say "this is not
that", this new approach will print the specific sub-schema that has
failed in order to allow more informative errors.
A second improvement is that by traversing the schema instead of
processing it recursively, the algorithm is more efficient in use
of system resources. In particular for schemas that have lots of
nested parts, this will make a difference.
Make the required changes to pass tests in all supported python versions.
This commit removes type hints present in the first commit in order to
allow using the code in older Python versions.
In addition:
* the use of `str` has been replaced by the compatible `unicode`.
* the ValidationNode namedtuple has been re-expressed in syntax available
in all supported Python versions.
* the use of a custom InvalidEvent exception has been replace by using
AvroTypeException
* all specific single-type validators have been replaced by partials of
_validate_type with a tuple of one or more type objects.
Fix typos and raise StopIteration as suggested in code review
Move the responsibility for validation to the Schema class.
Each schema subclass will be responsible for its own validation. This
simplifies the structure of io.py, removes the dict lookup of validators,
and reduces somewhat the repetition that was in io.py.
Move validators to a class attribute and update method code.
This makes things look a little bit cleaner than having the validators right in
the midst of the method.
Add arg spec docs to docstring for base Schema class.
Clean up mistakes.
* Fix a docstring to be a more accurate statement of reality.
* Remove an unused import.
* Remove extra blank lines.
> Replace recursive validation with traversal-based solution for Python avro
> --------------------------------------------------------------------------
>
> Key: AVRO-2906
> URL: https://issues.apache.org/jira/browse/AVRO-2906
> Project: Apache Avro
> Issue Type: Improvement
> Components: python
> Environment: [github pr|[https://github.com/apache/avro/pull/936]]
> Reporter: Cristopher Ewing
> Priority: Major
>
> The existing validation scheme for the Python implementation of avro is
> recursive. This is problematic in Python because language support for
> recursion is not great, and because for more deeply nested schemas, recursion
> is inefficient. Another issue with the current scheme is that error
> reporting for validation problems is generic. Unless a global variable in
> code is changed to allow errors for sub-schemas to be reported directly, the
> only report one gets is an exception that says the entire schema is invalid,
> which is not particularly useful when hunting bugs in serializing very large
> schemas.
> My proposal is to replace this existing validation approach with a new
> approach that uses breadth-first traversal of the schema for validation. The
> approach solves the inefficiencies of recursion, and at the same time, allows
> for errors to be reported for the exact spot in the over-all schema where
> they happened.
> My implementation, in [this PR in
> github|[https://github.com/apache/avro/pull/936]] also moves validation from
> a mapping of type/logical_type to lambda functions into a validate method on
> each schema type, ensuring that a schema is responsible for validating
> itself.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)