[
https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699872#comment-16699872
]
ASF GitHub Bot commented on AVRO-2274:
--------------------------------------
rstata commented on a change in pull request #393: AVRO-2274 Improve resolving
performance when schemas don't change.
URL: https://github.com/apache/avro/pull/393#discussion_r236507147
##########
File path: lang/java/avro/src/main/java/org/apache/avro/io/parsing/Symbol.java
##########
@@ -490,10 +511,20 @@ public static EnumAdjustAction enumAdjustAction(int
rsymCount, Object[] adj) {
}
public static class EnumAdjustAction extends IntCheckAction {
+ public final boolean noAdjustments;
public final Object[] adjustments;
@Deprecated public EnumAdjustAction(int rsymCount, Object[] adjustments) {
super(rsymCount);
this.adjustments = adjustments;
+ boolean noAdj = true;
+ if (adjustments != null) {
+ int count = Math.min(rsymCount, adjustments.length);
+ noAdj = (adjustments.length <= rsymCount);
+ for (int i = 0; noAdj && i < count; i++)
+ noAdj &= ((adjustments[i] instanceof Integer)
Review comment:
I think you highlighted this as an illustration, rather than as a "todo"
item, so I'm resolving the comment.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Improve resolving performance when schemas don't change
> -------------------------------------------------------
>
> Key: AVRO-2274
> URL: https://issues.apache.org/jira/browse/AVRO-2274
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Reporter: Raymie Stata
> Assignee: Raymie Stata
> Priority: Major
>
> Decoding optimizations based on the observation that schemas don't change
> very much. We add special-case paths to optimize the case where a
> _sub_schema of the reader and the writer are the same. The specific cases
> are:
> * In the case of an enumeration, if the reader and writer are the same, then
> we can simply return the tag written by the writer rather than "adjust" it as
> if it might have been re-ordered. In fact, we can do this (directly return
> the tag written by the writer) as long as the reader-schema is an "extension"
> of the writer's in that it may have added new symbols but hasn't renumbered
> any of the writer's symbols. Enumerations that either don't change at all or
> are "extended" as defined here are the common ways to extend enumerations.
> (Our tests show this optimization improves performance by about 3%.)
> * When the reader and writer subschemas are both unions, resolution is
> expensive: we have an outer union preceded by a "writer-union action", but
> each branch of this outer union consist of union-adjust actions, which are
> heavy weight. We optimize this case when the reader and writer unions are
> the same: we fall back on the standard grammar used for a union, avoiding all
> these adjustments. Since unions are commonly used to encode "nullable"
> fields in Avro, and nullability rarely changes as a schema evolves, this
> optimization should help many users. (Our tests show this optimization
> improves performance by 25-30%, a significant win.)
> * The "custom code" generated for reading records has to read fields in a
> loop that uses a switch statement to deal with writers that may have
> re-ordered fields. In most cases, however, fields have not been reordered
> (esp. in more complex records with many record sub-schemas). So we've added
> a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a
> variant of the existing readFieldOrder. If the field order has indeed
> changed, then readFieldOrderIfDiff returns the new field order, just like
> readFieldOrder does. However, if the field-order hasn't changed, then
> readFieldOrderIfDiff returns null. We then modified the generation of
> custom-decoders for records to add a special-case path that simply reads the
> record's fields in order, without incurring the overhead of the loop or the
> switch statement. (Our tests show this optimization improves performance by
> 8-9%, on top of the 35-40% produced by the original custom-coder
> optimization.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)