[ 
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205616#comment-13205616
 ] 

Doug Cutting commented on AVRO-1022:
------------------------------------

An implementation would be naive to trust that other implementations have 
validated all names in schemas it receives.  Java currently disables validation 
when reading a schema from a data file, since it's more important to be able to 
read the data.  With Generic APIs name validation isn't required and many 
applications use only generic APIs.

This would not require support for unicode identifiers in programming 
languages.  A code generator should escape any character in a name that's not 
easy for it to represent in an identifier.  We'd just be permitting code 
generators to take advantage of when a programming language does support 
Unicode in identifiers.

> If we went the other way (chance the spec), we'd have to answer a bunch of 
> design questions 
> (decide what is a "letter," decide on normalization, figure out how to mangle 
> names in various
> languages, etc.), and then implement validation in each language [ ... ]

I disagree.  Even if we removed all restrictions on naming I don't think we'd 
add much burden to implementations.  Most implementations don't do code 
generation.  Code generators already need to mangle names.  A code generator 
should already escape rather than die when it sees an unexpected character in a 
name.  (The alternative is an inability to generate code for schemas that 
someone else controls, a poor choice.)

So I don't see a new interoperability problem this would create.  We already 
have schemas in the wild whose names are invalid.

Perhaps we should change the spec to recommend that names be restricted to 
ASCII for ease of programming with generated APIs in all languages.  And we 
might check that in compiler, forcing folks to specify --escape-non-ASCII-names 
if they really want to generate code for a schema whose names contain non-ASCII 
characters, to discourage the use of non-ASCII in schemas that you do control.  
In general we could encourage implementations to both not trust that 
identifiers are all-ASCII and to try to encourage all-ASCII identifiers.
                
> Error in validate name
> ----------------------
>
>                 Key: AVRO-1022
>                 URL: https://issues.apache.org/jira/browse/AVRO-1022
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Raymie Stata
>            Priority: Minor
>         Attachments: AVRO-1022.patch
>
>
> Fix schema.validateName to allow only ASCII letters, not Unicode letters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to