GitHub user tjwp opened a pull request:
https://github.com/apache/avro/pull/230
Ruby encoding performance improvements
This change includes several optimizations of the validation performed
during encoding using Ruby. For a use case with a few levels of nesting and
unions in several places within the schema we saw a 5x improvement in encoding
performance with these changes.
The main changes are:
1. Avoid the exhaustive validation of schemas in a union. Previously a
datum was tested against all schemas in a union even though the failures were
unused if a compatible schema was found. Now validation stops when the first
compatible schema is found, but all failures are still available if there is no
compatible type.
2. Avoid the repeated validation of nested schemas. Previously, the datum
was recursively validated against the schema prior to encoding. Then during
encoding, each complex field (record, array, map, union) was recursively
validated again. Thus each field was validated a number of times equal to its
level of nesting plus one. This change introduces an option for validation not
to recurse. Since encoding proceeds recursively, validation is instead
performed as each level is encoded.
0ther minor improvements:
- delay creating error messages until they are required
- use explicit instead of dynamic code (`&method(:is_a?)`)
- additional use of constants
The only additional tests in this change demonstrate that validation
without recursion returns the same results for "simple" fields and no
validation errors for complex fields that would require recursion.
The updated methods for `Avro::Schema.validate` and
`Avro::SchemaValidator.validate!` were implemented to take an options hash with
the new `:recursive` option in anticipation of eventually being combined with
logical type support (https://github.com/apache/avro/pull/116) which would
specify whether the datum is already `:encoded`.
These changes have been tested against:
- 1.9.3-p551
- 2.0.0-p648
- 2.1.10
- 2.2.7
- 2.3.4
- 2.4.1
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/salsify/avro ruby-validation-perf
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/avro/pull/230.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #230
----
commit 97b350457b74a4b79b591f4e3d9b439a347fc5d7
Author: Tim Perkins <[email protected]>
Date: 2017-06-12T16:34:59Z
Ruby encoding performance improvements
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---