[
https://issues.apache.org/jira/browse/AVRO-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristopher Ewing updated AVRO-2906:
-----------------------------------
Description:
The existing validation scheme for the Python implementation of avro is
recursive. This is problematic in Python because language support for
recursion is not great, and because for more deeply nested schemas, recursion
is inefficient. Another issue with the current scheme is that error reporting
for validation problems is generic. Unless a global variable in code is changed
to allow errors for sub-schemas to be reported directly, the only report one
gets is an exception that says the entire schema is invalid, which is not
particularly useful when hunting bugs in serializing very large schemas.
My proposal is to replace this existing validation approach with a new approach
that uses breadth-first traversal of the schema for validation. The approach
solves the inefficiencies of recursion, and at the same time, allows for errors
to be reported for the exact spot in the over-all schema where they happened.
My implementation, in [this PR in
github|[https://github.com/apache/avro/pull/936]] also moves validation from a
mapping of type/logical_type to lambda functions into a validate method on each
schema type, ensuring that a schema is responsible for validating itself.
was:
The existing validation scheme for the Python implementation of avro is
recursive. This is problematic in Python because language support for
recursion is not great, and because for more deeply nested schemas, recursion
is inefficient. Another issue with the current scheme is that error reporting
for validation problems is generic. Unless a global variable in code is changed
to allow errors for sub-schemas to be reported directly, the only report one
gets is an exception that says the entire schema is invalid, which is not
particularly useful when hunting bugs in serializing very large schemas.
My proposal is to replace this existing validation approach with a new approach
that uses breadth-first traversal of the schema for validation. The approach
solves the inefficiencies of recursion, and at the same time, allows for errors
to be reported for the exact spot in the over-all schema where they happened.
My implementation, in this PR in github
> Replace recursive validation with traversal-based solution for Python avro
> --------------------------------------------------------------------------
>
> Key: AVRO-2906
> URL: https://issues.apache.org/jira/browse/AVRO-2906
> Project: Apache Avro
> Issue Type: Improvement
> Components: python
> Environment: [github pr|[https://github.com/apache/avro/pull/936]]
> Reporter: Cristopher Ewing
> Priority: Major
>
> The existing validation scheme for the Python implementation of avro is
> recursive. This is problematic in Python because language support for
> recursion is not great, and because for more deeply nested schemas, recursion
> is inefficient. Another issue with the current scheme is that error
> reporting for validation problems is generic. Unless a global variable in
> code is changed to allow errors for sub-schemas to be reported directly, the
> only report one gets is an exception that says the entire schema is invalid,
> which is not particularly useful when hunting bugs in serializing very large
> schemas.
> My proposal is to replace this existing validation approach with a new
> approach that uses breadth-first traversal of the schema for validation. The
> approach solves the inefficiencies of recursion, and at the same time, allows
> for errors to be reported for the exact spot in the over-all schema where
> they happened.
> My implementation, in [this PR in
> github|[https://github.com/apache/avro/pull/936]] also moves validation from
> a mapping of type/logical_type to lambda functions into a validate method on
> each schema type, ensuring that a schema is responsible for validating
> itself.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)