[ 
https://issues.apache.org/jira/browse/AVRO-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775951#action_12775951
 ] 

Doug Cutting commented on AVRO-182:
-----------------------------------

> I think the equals() thing is a bug. 

It was intentional.  I think of it as a feature, but that's probably 
idiosyncratic.  The canonical use of equals() is in HashMap.  With TreeMap, if 
you try to mix classes, you'll probably get ClassCastException from the 
compareTo implementation.  But, with a HashMap, if you get a ClassCastException 
from an equals() implementation it's a bug?

We could add 'if (!(that instanceof Record)) return false;' to the front of all 
of these (I had it at one point and removed it).  It makes the code bigger and 
slower and may make it harder to find uses of null pointers and incorrect 
types, but, on the other hand, it may permit some uses that would not otherwise 
be permitted.  I lean towards smaller code that's finicky about types to 
greater flexibility.  But if you still think it's a bug, I'll change it.

> Do you feel it beneficial to include the hashcode of the schema in the 
> hashcode as well?

No.  It would make hashCode slower.  It would prevent false-positives for data 
that has the same leaf values in the same order but in different types, but I 
think such cases are rare.  Most Java hashCode implementations do not, e.g., 
include the hash of the class name, just the instance data.

> it seems like extracting the logic for combining two hash codes into a 
> function might be nice.

That would be nice.  I'm leery of varrargs, since they'll allocate an array per 
call, and hashCode should be wicked fast.  We might instead add a method like:

{code}
public int incrementHash(int hashCode, Object o, Schema s) {
  return 31*hashCode + hashCode(o, s)
}
{code}

Do you like that?  Do you think we should throw in an xor or a shift?

> You chose GenericArray to delegate to GenericData [ ... ]

I'm not sure what you mean.  GenericArray is the interface that 
GenericDatumReader and GenericDatumWriter use.  So one can change one's array 
implementation, and, as long as one implements GenericArray, one can use 
GenericDatumReader and GenericDatumWriter unchanged.  The standard 
implementation of GenericArray is GenericData.Array.  But if one does not like 
the GenericArray interface, then one can override methods in GenericDatumReader 
and GenericDatumWriter to use an arbitrary class to represent arrays and use 
GenericArray at all.

GenericData has code that's shared by GenericDatumReader and 
GenericDatumWriter, or that's otherwise common to generic data.  GenericData is 
a singleton, so that its methods may be overidden by singleton subclasses 
SpecificData and ReflectData.  In this way, SpecificDatumReader and 
SpecificDatumWriter can, e.g., share much of the GenericDatumReader and 
GenericDatumWriter logic, with minimal code duplication.

So, back to your question, I don't see that GenericArray delegates to 
GenericData, but maybe the explanations above help to better explain the 
structure of this stuff?

> hashCode and equals are not consistent with compareTo
> -----------------------------------------------------
>
>                 Key: AVRO-182
>                 URL: https://issues.apache.org/jira/browse/AVRO-182
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>         Attachments: AVRO-182.patch
>
>
> Java's specific and generic APIs implement compareTo according to the schema, 
> where some fields might be ignored.  To be consistent, fields that are 
> ignored when comparing for ordering should also be ignored when comparing for 
> equality and for computing hashCodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to