[
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192909#comment-13192909
]
Raymie Stata commented on AVRO-1006:
------------------------------------
An _Avro schema fingerprint_ is a hash of an Avro schema. Within a collection
of even a million schemas, the probability of a collision is still around
0.001%. Thus, fingerprints can be used in place of schemas.
One motivating use-case for fingerprints is a pub/sub message bus. On a
pub/sub bus, since multiple writers can publish to the same topic using
different schemas, each message must be associated with its schema. Rather
than include the actual schema with every message, one can instead include the
fingerprint of the schema, which would be smaller. When a reader encounters a
fingerprint it hasn't seen before, it can look it up and cache it. (The
attached document describes possible lookup mechanisms.)
The proposed approach to fingerprinting is pretty straight forward. First, we
convert Avro schemas into a _canonical form._ Two equivalent schemas always
have the same canonical form. Once we have the canonical form, we simply take
a 64-bit "Rabin fingerprint" (a CRC) of that text.
> Fingerprints for Avro Schemas
> -----------------------------
>
> Key: AVRO-1006
> URL: https://issues.apache.org/jira/browse/AVRO-1006
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Raymie Stata
> Assignee: Raymie Stata
> Labels: features
> Attachments: schema-fingerprinting.html
>
>
> Add function that returns a standardized, 64-bit fingerprint that can be used
> as a key in various contexts.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira