[
https://issues.apache.org/jira/browse/AVRO-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063377#comment-13063377
]
Douglas Kaminsky commented on AVRO-853:
---------------------------------------
My only concern is losing hash keys - fixing the hash code the first time
hashCode is called would be sufficient to ensure that further retrieval of the
same exact object would work, but would prevent the use of hashing to
"duck-type" a schema.
e.g. Supposing you encountered this first schema:
{code}
{
"type": "record",
"name": "foo",
"fields" : [
{"name": "a", "type": "long"},
{"name": "b", "type": "boolean"}
]
}
{code}
You hash the value, then later encounter this schema:
{code}
{
"type": "record",
"name": "foo",
"fields" : [
{"name": "a", "type": "long"},
{"name": "b", "type": "boolean"}
],
"some.property" : "propvalue"
}
{code}
Your use case may or may not depend on these things being considered equal...
Should there be a second equality method that favors content over structure
without forcing the end user to compare the schema internals? Using schemaA and
schemaB for the above, something like:
schemaA.equals(schemaB); // false
schemaA.quacksLike(schemaB); // true -- yes, stupid name, please pick something
better
That nicely solves the equality problem by letting the end user decide whether
they care about form or substance.
As to the original issue, I'd just reiterate that although I agree it is ideal
for all "equality" components to play a role in hashing, it is not and should
not be a requirement, as long as there is no significant performance impact. I
feel sufficient documentation (ie. "hey, btw, you will get poor hashing
performance if you try to use 50,000 copies of the boolean schema with
different properties as hash keys") would be enough to assuage most developers'
concerns.
> Cache hash codes in Schema and Field
> ------------------------------------
>
> Key: AVRO-853
> URL: https://issues.apache.org/jira/browse/AVRO-853
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.5.1
> Reporter: Douglas Kaminsky
> Attachments: AVRO-853-approach2.patch, AVRO-853.patch
>
>
> We are experiencing a serious performance degradation when trying to
> store/retrieve fields and schemas in hash-based data structures (eg.
> HashMap). Since all fields and schemas are immutable (with the exception of
> RecordSchema allowing deferred setting of Fields) it makes sense to cache the
> hash code on the object instead of recalculating every time the hashCode
> method gets called.
> (Are there other mutable Schema sub-types that I'm not thinking about?)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira