[
https://issues.apache.org/jira/browse/AVRO-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Felix GV updated AVRO-2307:
---------------------------
Labels: performance (was: )
> Opt-in setting to improve GC behavior during deserialization?
> -------------------------------------------------------------
>
> Key: AVRO-2307
> URL: https://issues.apache.org/jira/browse/AVRO-2307
> Project: Apache Avro
> Issue Type: Improvement
> Components: compatibility, java
> Affects Versions: 1.7.7
> Reporter: Felix GV
> Priority: Minor
> Labels: performance
>
> We have a performance-sensitive project that leverages Avro for an online
> data store, and I wanted to report on a couple of Avro deserialization
> optimizations we have implemented. On one hand, it is great that Avro’s code
> is clean and modular enough to have allowed us to achieve this easily. But on
> the other hand, we are leveraging parts of the API which are probably not
> typically used by most users, and thus we are exposing ourselves to ongoing
> maintenance costs as those “ambiguously-public” APIs might change in future
> versions. For this reason, I wanted to gauge the appetite of the Avro
> community for taking in those optimizations upstream into the main project.
> The minor challenge is that the optimizations we’ve made are not completely
> invisible, and therefore should probably be presented as opt-in settings,
> rather than new defaults. Below is a summary of both changes.
> *1. Re-use of byte arrays when instantiating ByteBuffers*
> When deserializing a byte array that contains a ByteBuffer field, the
> relevant portion of the input byte array is copied into a new byte array,
> which is then used as the backing array of a new ByteBuffer. This is
> completely safe, but appears to be inefficient from our point of view,
> requiring extra CPU cycles and generating garbage.
> In our case, we have a few schemas which contain some general metadata and an
> opaque byte array payload, which often ends up being a significant portion of
> the total byte length. Recopying these bytes results in up to 2x the byte
> allocation (assuming the schema contained just a ByteBuffer field and nothing
> else). The ByteBuffer API, however, provides an alternative behavior where
> the backing array can be larger than needed, with an offset and length
> provided to indicate the internal boundaries of the payload. In our
> implementation, we re-use the input byte array as the ButeBuffer’s backing
> array, therefore avoiding both the allocation and the copy.
> The caveat in this case is that this only works safely for use cases that
> don’t mutate the content of the bytes (neither the input nor the deserialized
> object). In our case this assumption is valid.
> If this was implemented in the open-source project, there are a few ways this
> could be achieved:
> # We could return a special read-only ByteBuffer implementation that throws
> an exception if any mutation is attempted, indicating that this special mode
> ought to be turned off to support mutations.
> # We could return a modified ByteBuffer implementation which defers the copy
> of the content lazily until (and only if) one of the mutation API is called.
> More user-friendly, perhaps, but could introduce hidden overhead which the
> application designer might prefer to be alerted to rather than being silently
> shoved under the rug.
> Either way, this probably requires a custom ByteBuffer implementation with
> special behavior in order to be cleaner. In both cases, however, I don’t see
> a way to guard against mutations to the input byte array, therefore, this
> should probably be an opt-in setting for users who know what they’re doing
> and are comfortable with the read-only assumption in their workloads.
> *2. Primitive (non-boxing) implementation of lists*
> Another challenge we’ve come across is that lists of primitive types (floats
> in our case) are always boxed into Object Floats by Avro. In our case, this
> resulted in millions of Objects / second getting allocated, causing
> pathological GC pressure.
> To solve this, we have implemented an alternative version of the Avro Array
> class, but which instead of hanging on to an array of generically-typed
> Objects, internally, hangs on to an array of primitive floats. This causes no
> boxing at deserialization time, but there is a further challenge which is
> that since Avro array fields are exposed as instances of the Java List
> interface, the regular functions of the API all return Objects, therefore
> merely deferring the boxing to a slightly later point in time. To get around
> this further complication, we have added a getPrimitive(index) function which
> returns primitive items directly. In order to be able to use this more
> optimized function, it is necessary for us to cast the list into our own
> type, otherwise we wouldn’t see the new function. The end result is quite
> dramatic, performance-wise, reducing our p99 latencies down to a quarter to a
> third of their original values, depending on workloads.
> One challenge here is that the “PrimitiveFloatArray” class is an almost
> complete copy of Avro’s Array class, basically just stripping away the
> generics and adding the primitive-returning function. If we were to
> contribute this upstream to the open-source project, I imagine we might want
> to do this not only for floats but for boolean, int, long and double arrays
> as well. This would mean roughly 5x the same copy-pasted implementation,
> which is not ideal from a maintenance standpoint. The generic types are nicer
> in that sense, but unfortunately, Java generics do not support primitives
> (AFAIK, unless one of the latest Java releases introduced support for this
> recently?). In our case, we are willing to pay that maintenance cost in
> exchange for the dramatic GC reduction it gives us, but I don’t know if the
> Avro project would come to the same conclusion.
> If implemented, this could potentially be the new default behavior, though
> boxing would still happen (merey deferred) unless users casted the lists into
> the new types and used the new primitive-returning functions. In use cases
> where an entire primitive array is deserialized but only a subset of its
> items are accessed, this boxing deferral could still provide benefits,
> however.
> —
> So again, I re-iterate: the goal of this ticket is to share what we’ve found
> and what we’ve done about it, with the intent to understand if there would be
> interest in reviewing and merging a patch which provided the above opt-in
> benefits.
> Thanks for your feedback.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)