[ 
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203082#comment-13203082
 ] 

Doug Cutting commented on AVRO-1006:
------------------------------------

Some notes:

- Primitive types may have attributes, e.g., {"type":"int", 
"java-class":"java.lang.Short"}, so only primitives without any attributes may 
be represented by their name alone.

- Attributes within JSON objects are not ordered.  A correct JSON parser need 
not preserve ordering.  Relying on order-preservation may require some 
implementations to write their own JSON libraries.

- With multiple Avro implementations, the chance of an inconsistent 
canonicalization implementation is significant.  Creating an adequate test 
suite and validating all implementations would require significant effort.

Given the above, I'd be hesitant to build a system that depends on consistent 
canonical schemas for correct operation.  Folks who build systems that use Avro 
would thus be wise to design them to gracefully handle inconsistent 
canonicalization.  For example, Avro's RPC handshake currently uses a 
fingerprint-like approach without requiring canonicalization.  Two 
implementations that represent a schema using the same string will have more 
efficient handshakes, but implementations that produce different strings for 
equivalent schemas will still interoperate correctly.  So a standard, 
recommended canonical form could be useful, but folks should perhaps not assume 
that every implementation is correct.

I like the idea of a schema repository.  A related idea I've had is to use 
something like a URL shortener.  Instead of mapping url->url, it could map 
url->schema.  One would register one's schema with the shortener, then hand out 
references.  A shortener would, as an optimization, return the same ID for 
equivalent schemas.  The shortener would only need to rely on only a single 
canonicalization implementation, its own.


                
> Fingerprints for Avro Schemas
> -----------------------------
>
>                 Key: AVRO-1006
>                 URL: https://issues.apache.org/jira/browse/AVRO-1006
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Raymie Stata
>              Labels: features
>         Attachments: schema-fingerprinting.html, schema-fingerprinting.html, 
> schema-fingerprinting.html
>
>
> Add function that returns a standardized, 64-bit fingerprint for schemas.  
> Fingerprints are designed such that the chances of collisions is very, very 
> low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to