[ 
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192915#comment-13192915
 ] 

Raymie Stata commented on AVRO-1006:
------------------------------------

In implementing canonical forms, we identified some bugs (both spec and impl 
bugs) that it would be nice to resolve:

* The parser assumes that names are defined before they are used, although this 
is not required by the specification.  We recommend that the spec be changed to 
agree with the impl (i.e., require that names are defined before they are used).

* When a schema name is redefined, the parser throws an exception, even if the 
two definitions of the name are the same.  This is contrary to the spec, which 
says "A schema may only contain multiple definitions of a fullname if the 
definitions are equivalent."  We recommend that the spec be changed to agree 
with the implementation (i.e., disallow multiple definitions of the same name, 
even if the def's are the same).

* The parser calls {{validateName}} on the symbols of an enumeration, 
restricting the syntax of enumeration symbols.  The spec does not call for such 
a restriction.  We recommend that the spec be changed to conform to the 
implementation (i.e., restrict symbols the same way we restrict names).  This 
helps in cannonicalization (don't have to worry about Unicode normalization).

* {{Schema.validateName}} uses {{Character.isLetter}} and 
{{Character.isLetterOrDigit}} to test characters.  These accept all Unicode 
characters (except supplemental ones).  The Avro spec says that names should be 
restricted to ASCII letters.  We think this is an implementation bug and should 
be fixed.  (Again, nice to avoid Unicode normalization.)

* When the parser descends into a named schema, the default namespace in the 
{{names}} variable is stored into the local variable {{savedSpace}}, which is 
restored on exit.  However, if the routine exits abruptly (with an exception), 
this restoration does not occur.  This is probably a bug, and restoration 
should be in a finally.  (In {{Parser.parse}}, the flag {{validateNames}} is 
restored in a finally clause.)

Before submitting fixes, I'll file separate JIRAs, but I'd like to get any 
feedback here first.
                
> Fingerprints for Avro Schemas
> -----------------------------
>
>                 Key: AVRO-1006
>                 URL: https://issues.apache.org/jira/browse/AVRO-1006
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Raymie Stata
>              Labels: features
>         Attachments: schema-fingerprinting.html
>
>
> Add function that returns a standardized, 64-bit fingerprint for schemas.  
> Fingerprints are designed such that the chances of collisions is very, very 
> low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to