[ 
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203219#comment-13203219
 ] 

Thiruvalluvan M. G. commented on AVRO-1006:
-------------------------------------------

{quote}
Doug's point about JSON being an unordered format is important and limits using 
the json string as the fingerprint.
Perhaps we can complete the Avro Schema for schemas (AVRO-251) which can define 
field order and equivalence unambiguously and all implementations should be 
able to support. The output bytes from the Avro binary serialization of the 
schema can be used to feed a hash algorithm.
{quote}

While representing the canonical schema as Avro data reduces it (compared to 
Json representation) it does not eliminate ambiguity. Non-empty arrays (and 
maps) can be represented in Avro in more than one way.

Doug's observation implies that we cannot use a third-party Json library to 
generate the canonical representation. For fingerprinting to work, we need some 
canonical representation (which by definition is not ambiguous). Either we 
restrict (by removing ambiguities) an existing standard or invent a new one.

I think Raymie's canonicalization rules are simple and given that we'll have 
only US-ASCII characters in the canonical representation, writing a JSON 
generator in any language will not be hard. And it will be parsable (with no 
new code) and human-readable.
                
> Fingerprints for Avro Schemas
> -----------------------------
>
>                 Key: AVRO-1006
>                 URL: https://issues.apache.org/jira/browse/AVRO-1006
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Raymie Stata
>              Labels: features
>         Attachments: schema-fingerprinting.html, schema-fingerprinting.html, 
> schema-fingerprinting.html
>
>
> Add function that returns a standardized, 64-bit fingerprint for schemas.  
> Fingerprints are designed such that the chances of collisions is very, very 
> low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to