[
https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thiruvalluvan M. G. resolved AVRO-2274.
---------------------------------------
Resolution: Fixed
Merged the PR. Thank you [~raymie].
> Improve resolving performance when schemas don't change
> -------------------------------------------------------
>
> Key: AVRO-2274
> URL: https://issues.apache.org/jira/browse/AVRO-2274
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Reporter: Raymie Stata
> Assignee: Raymie Stata
> Priority: Major
>
> Decoding optimizations based on the observation that schemas don't change
> very much. We add special-case paths to optimize the case where a
> _sub_schema of the reader and the writer are the same. The specific cases
> are:
> * In the case of an enumeration, if the reader and writer are the same, then
> we can simply return the tag written by the writer rather than "adjust" it as
> if it might have been re-ordered. In fact, we can do this (directly return
> the tag written by the writer) as long as the reader-schema is an "extension"
> of the writer's in that it may have added new symbols but hasn't renumbered
> any of the writer's symbols. Enumerations that either don't change at all or
> are "extended" as defined here are the common ways to extend enumerations.
> (Our tests show this optimization improves performance by about 3%.)
> * When the reader and writer subschemas are both unions, resolution is
> expensive: we have an outer union preceded by a "writer-union action", but
> each branch of this outer union consist of union-adjust actions, which are
> heavy weight. We optimize this case when the reader and writer unions are
> the same: we fall back on the standard grammar used for a union, avoiding all
> these adjustments. Since unions are commonly used to encode "nullable"
> fields in Avro, and nullability rarely changes as a schema evolves, this
> optimization should help many users. (Our tests show this optimization
> improves performance by 25-30%, a significant win.)
> * The "custom code" generated for reading records has to read fields in a
> loop that uses a switch statement to deal with writers that may have
> re-ordered fields. In most cases, however, fields have not been reordered
> (esp. in more complex records with many record sub-schemas). So we've added
> a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a
> variant of the existing readFieldOrder. If the field order has indeed
> changed, then readFieldOrderIfDiff returns the new field order, just like
> readFieldOrder does. However, if the field-order hasn't changed, then
> readFieldOrderIfDiff returns null. We then modified the generation of
> custom-decoders for records to add a special-case path that simply reads the
> record's fields in order, without incurring the overhead of the loop or the
> switch statement. (Our tests show this optimization improves performance by
> 8-9%, on top of the 35-40% produced by the original custom-coder
> optimization.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)