[
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203181#comment-13203181
]
Scott Carey commented on AVRO-1006:
-----------------------------------
More notes:
* Schema equivalence has a few variations
** Serialization equivalent -- attribute metadata is irrelevant,
{"type":"int", "java-class":"java.lang.Short"} is equal to {"int"}. Defaults
and doc are also irrelevant for this case.
** Serialization and metadata equivalence, where the above two are not
equivalent.
** Reversible transformation equivalence, e.g. ["int", "string"] equals
["string", "int], or records with pure field reordering.
* Other schema relationships that are related to equivalence but cannot satisfy
associativity and transitivity
** Alias equivalence is not transitive, but is associative.
** Schema resolution and transformation is often neither transitive or
associative.
All three equivalence variations above may be useful for different purposes,
especially the first two. Serialization equivalence is important for long term
storage. Full equivalence with metadata is often needed for internal state.
But we may want to let users specify which optional components are included
(attributes, defaults, doc). Doug's point about JSON being an unordered format
is important and limits using the json string as the fingerprint.
Perhaps we can complete the Avro Schema for schemas (AVRO-251) which can define
field order and equivalence unambiguously and all implementations should be
able to support. The output bytes from the Avro binary serialization of the
schema can be used to feed a hash algorithm.
> Fingerprints for Avro Schemas
> -----------------------------
>
> Key: AVRO-1006
> URL: https://issues.apache.org/jira/browse/AVRO-1006
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Raymie Stata
> Assignee: Raymie Stata
> Labels: features
> Attachments: schema-fingerprinting.html, schema-fingerprinting.html,
> schema-fingerprinting.html
>
>
> Add function that returns a standardized, 64-bit fingerprint for schemas.
> Fingerprints are designed such that the chances of collisions is very, very
> low.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira