[ 
https://issues.apache.org/jira/browse/AVRO-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062220#comment-13062220
 ] 

Doug Cutting commented on AVRO-853:
-----------------------------------

As for aliases, two record schemas whose names differ should not be equal via 
aliases, since they'll read into records of different types.  So 
differently-named schemas should not be considered equal via aliases.

To record schemas whose names match but with a different set of aliases will be 
able to read different files and be involved in different protocols; they'll 
behave differently and thus should not be treated as equal.

Properties are similar: Java reflection uses a property to determine whether to 
read an array into some kind of a List or into a raw array.  So two schemas 
that differ only in properties may read the same data into non-equivalent data 
structures and thus these schemas should not be considered equal.

Also, if something is used by equals() then, if possible, it should be 
incorporated in to hashCode().  We cannot predict when there might be a large 
number of items that differ only in this aspect.

I think this patch is basically the right approach for now.  Since mutation 
after storing in a Map or Set is already an error we cannot detect, perhaps we 
should not worry about invalidation at all.

> Cache hash codes in Schema and Field
> ------------------------------------
>
>                 Key: AVRO-853
>                 URL: https://issues.apache.org/jira/browse/AVRO-853
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.1
>            Reporter: Douglas Kaminsky
>         Attachments: AVRO-853-approach2.patch, AVRO-853.patch
>
>
> We are experiencing a serious performance degradation when trying to 
> store/retrieve fields and schemas in hash-based data structures (eg. 
> HashMap). Since all fields and schemas are immutable (with the exception of 
> RecordSchema allowing deferred setting of Fields) it makes sense to cache the 
> hash code on the object instead of recalculating every time the hashCode 
> method gets called. 
> (Are there other mutable Schema sub-types that I'm not thinking about?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to