[
https://issues.apache.org/jira/browse/AVRO-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061039#comment-13061039
]
James Baldassari commented on AVRO-853:
---------------------------------------
I just hammered out a quick patch that uses the second approach I described
earlier, in which we traverse the entire schema graph and cache the local state
at each node in the graph. The local cache is invalidated upon mutation. I
ran a test in which I called hashCode() on a record schema 10,000 times and
measured the amount of time required to complete all hashCode() calls. This
patch decreased the run time of the test by a little over 27% (from 537ms down
to 389ms). It isn't a slam dunk, but I guess it's an improvement. I haven't
had a chance yet to test that mutating the schema causes the hash code to be
updated. I also haven't added the Field aliases to hashCode() and equals(),
but if this looks like a good approach I can work on the remaining tasks.
> Cache hash codes in Schema and Field
> ------------------------------------
>
> Key: AVRO-853
> URL: https://issues.apache.org/jira/browse/AVRO-853
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.5.1
> Reporter: Douglas Kaminsky
> Attachments: AVRO-853-approach2.patch, AVRO-853.patch
>
>
> We are experiencing a serious performance degradation when trying to
> store/retrieve fields and schemas in hash-based data structures (eg.
> HashMap). Since all fields and schemas are immutable (with the exception of
> RecordSchema allowing deferred setting of Fields) it makes sense to cache the
> hash code on the object instead of recalculating every time the hashCode
> method gets called.
> (Are there other mutable Schema sub-types that I'm not thinking about?)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira