Raymie Stata created AVRO-2274:
----------------------------------
Summary: Improve resolving performance when schemas don't change
Key: AVRO-2274
URL: https://issues.apache.org/jira/browse/AVRO-2274
Project: Apache Avro
Issue Type: Improvement
Components: java
Reporter: Raymie Stata
Assignee: Raymie Stata
Decoding optimizations based on the observation that schemas don't change very
much. We add special-case paths to optimize the case where a _sub_schema of
the reader and the writer are the same. The specific cases are:
* In the case of an enumeration, if the reader and writer are the same, then we
can simply return the tag written by the writer rather than "adjust" it as if
it might have been re-ordered. In fact, we can do this (directly return the
tag written by the writer) as long as the reader-schema is an "extension" of
the writer's in that it may have added new symbols but hasn't renumbered any of
the writer's symbols. Enumerations that either don't change at all or are
"extended" as defined here are the common ways to extend enumerations. (Our
tests show this optimization improves performance by about 3%.)
* When the reader and writer subschemas are both unions, resolution is
expensive: we have an outer union preceded by a "writer-union action", but each
branch of this outer union consist of union-adjust actions, which are heavy
weight. We optimize this case when the reader and writer unions are the same:
we fall back on the standard grammar used for a union, avoiding all these
adjustments. Since unions are commonly used to encode "nullable" fields in
Avro, and nullability rarely changes as a schema evolves, this optimization
should help many users. (Our tests show this optimization improves performance
by 25-30%, a significant win.)
* The "custom code" generated for reading records has to read fields in a loop
that uses a switch statement to deal with writers that may have re-ordered
fields. In most cases, however, fields have not been reordered (esp. in more
complex records with many record sub-schemas). So we've added a new method to
ResolvingDecoder called readFieldOrderIfDiff, which is a variant of the
existing readFieldOrder. If the field order has indeed changed, then
readFieldOrderIfDiff returns the new field order, just like readFieldOrder
does. However, if the field-order hasn't changed, then readFieldOrderIfDiff
returns null. We then modified the generation of custom-decoders for records
to add a special-case path that simply reads the record's fields in order,
without incurring the overhead of the loop or the switch statement. (Our tests
show this optimization improves performance by 8-9%, on top of the 35-40%
produced by the original custom-coder optimization.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)