GitHub user tjwp opened a pull request:

    https://github.com/apache/avro/pull/230

    Ruby encoding performance improvements

    This change includes several optimizations of the validation performed 
during encoding using Ruby. For a use case with a few levels of nesting and 
unions in several places within the schema we saw a 5x improvement in encoding 
performance with these changes.
    
    The main changes are:
    
    1. Avoid the exhaustive validation of schemas in a union. Previously a 
datum was tested against all schemas in a union even though the failures were 
unused if a compatible schema was found. Now validation stops when the first 
compatible schema is found, but all failures are still available if there is no 
compatible type.
    
    2. Avoid the repeated validation of nested schemas. Previously, the datum 
was recursively validated against the schema prior to encoding. Then during 
encoding, each complex field (record, array, map, union) was recursively 
validated again. Thus each field was validated a number of times equal to its 
level of nesting plus one. This change introduces an option for validation not 
to recurse. Since encoding proceeds recursively, validation is instead 
performed as each level is encoded.
    
    0ther minor improvements:
    - delay creating error messages until they are required
    - use explicit instead of dynamic code (`&method(:is_a?)`)
    - additional use of constants
    
    The only additional tests in this change demonstrate that validation 
without recursion returns the same results for "simple" fields and no 
validation errors for complex fields that would require recursion.
    
    The updated methods for `Avro::Schema.validate` and 
`Avro::SchemaValidator.validate!` were implemented to take an options hash with 
the new `:recursive` option in anticipation of eventually being combined with 
logical type support (https://github.com/apache/avro/pull/116) which would 
specify whether the datum is already `:encoded`.
    
    These changes have been tested against:
      - 1.9.3-p551
      - 2.0.0-p648
      - 2.1.10
      - 2.2.7
      - 2.3.4
      - 2.4.1

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/salsify/avro ruby-validation-perf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/avro/pull/230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #230
    
----
commit 97b350457b74a4b79b591f4e3d9b439a347fc5d7
Author: Tim Perkins <[email protected]>
Date:   2017-06-12T16:34:59Z

    Ruby encoding performance improvements

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to