[ 
https://issues.apache.org/jira/browse/AVRO-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060970#comment-13060970
 ] 

Doug Cutting commented on AVRO-853:
-----------------------------------

It's okay to use schemas as Map keys, but it's an error to change a schema 
after putting it in a Map.  We don't detect that error.  If schema's were 
immutable that error would be impossible.

I'm not too worried about using zero to indicate 'invalid hash'.  A more random 
value might be a bit better, but a collision with this value is not fatal, it 
just makes the cache ineffective for those few schemas whose hashcode is that 
value.  We could instead have a flag to indicate that the hashcode is unset, or 
use a boxed Integer, both of which would use more memory without benefit in 
2^32-1 cases.  Whatever value we use should probably be a constant.  We can 
explicitly check when calculateHashCode returns this value and use a different 
value when it occurs.

In summary, I think adding a hashcode cache to Schema's would be a good thing 
to do.  I think this patch could be improved a bit, but I'm generally in favor 
of committing something like this soon.

> Cache hash codes in Schema and Field
> ------------------------------------
>
>                 Key: AVRO-853
>                 URL: https://issues.apache.org/jira/browse/AVRO-853
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.1
>            Reporter: Douglas Kaminsky
>         Attachments: AVRO-853.patch
>
>
> We are experiencing a serious performance degradation when trying to 
> store/retrieve fields and schemas in hash-based data structures (eg. 
> HashMap). Since all fields and schemas are immutable (with the exception of 
> RecordSchema allowing deferred setting of Fields) it makes sense to cache the 
> hash code on the object instead of recalculating every time the hashCode 
> method gets called. 
> (Are there other mutable Schema sub-types that I'm not thinking about?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to