[
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203219#comment-13203219
]
Thiruvalluvan M. G. commented on AVRO-1006:
-------------------------------------------
{quote}
Doug's point about JSON being an unordered format is important and limits using
the json string as the fingerprint.
Perhaps we can complete the Avro Schema for schemas (AVRO-251) which can define
field order and equivalence unambiguously and all implementations should be
able to support. The output bytes from the Avro binary serialization of the
schema can be used to feed a hash algorithm.
{quote}
While representing the canonical schema as Avro data reduces it (compared to
Json representation) it does not eliminate ambiguity. Non-empty arrays (and
maps) can be represented in Avro in more than one way.
Doug's observation implies that we cannot use a third-party Json library to
generate the canonical representation. For fingerprinting to work, we need some
canonical representation (which by definition is not ambiguous). Either we
restrict (by removing ambiguities) an existing standard or invent a new one.
I think Raymie's canonicalization rules are simple and given that we'll have
only US-ASCII characters in the canonical representation, writing a JSON
generator in any language will not be hard. And it will be parsable (with no
new code) and human-readable.
> Fingerprints for Avro Schemas
> -----------------------------
>
> Key: AVRO-1006
> URL: https://issues.apache.org/jira/browse/AVRO-1006
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Raymie Stata
> Assignee: Raymie Stata
> Labels: features
> Attachments: schema-fingerprinting.html, schema-fingerprinting.html,
> schema-fingerprinting.html
>
>
> Add function that returns a standardized, 64-bit fingerprint for schemas.
> Fingerprints are designed such that the chances of collisions is very, very
> low.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira