[
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203082#comment-13203082
]
Doug Cutting commented on AVRO-1006:
------------------------------------
Some notes:
- Primitive types may have attributes, e.g., {"type":"int",
"java-class":"java.lang.Short"}, so only primitives without any attributes may
be represented by their name alone.
- Attributes within JSON objects are not ordered. A correct JSON parser need
not preserve ordering. Relying on order-preservation may require some
implementations to write their own JSON libraries.
- With multiple Avro implementations, the chance of an inconsistent
canonicalization implementation is significant. Creating an adequate test
suite and validating all implementations would require significant effort.
Given the above, I'd be hesitant to build a system that depends on consistent
canonical schemas for correct operation. Folks who build systems that use Avro
would thus be wise to design them to gracefully handle inconsistent
canonicalization. For example, Avro's RPC handshake currently uses a
fingerprint-like approach without requiring canonicalization. Two
implementations that represent a schema using the same string will have more
efficient handshakes, but implementations that produce different strings for
equivalent schemas will still interoperate correctly. So a standard,
recommended canonical form could be useful, but folks should perhaps not assume
that every implementation is correct.
I like the idea of a schema repository. A related idea I've had is to use
something like a URL shortener. Instead of mapping url->url, it could map
url->schema. One would register one's schema with the shortener, then hand out
references. A shortener would, as an optimization, return the same ID for
equivalent schemas. The shortener would only need to rely on only a single
canonicalization implementation, its own.
> Fingerprints for Avro Schemas
> -----------------------------
>
> Key: AVRO-1006
> URL: https://issues.apache.org/jira/browse/AVRO-1006
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Raymie Stata
> Assignee: Raymie Stata
> Labels: features
> Attachments: schema-fingerprinting.html, schema-fingerprinting.html,
> schema-fingerprinting.html
>
>
> Add function that returns a standardized, 64-bit fingerprint for schemas.
> Fingerprints are designed such that the chances of collisions is very, very
> low.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira