[ 
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204844#comment-13204844
 ] 

Doug Cutting commented on AVRO-1022:
------------------------------------

> And doing Unicode right is a lot of work; doing it poorly will just create a 
> nasty source of interop problems.

I don't see this.  Avro already requires that JSON parsers "do Unicode right".  
Permitting non-ASCII in identifiers only creates problems when generating code. 
 The potential interoperability problem could be that some implementations, 
when given a schema, would be unable to generate valid code in their 
programming language for that schema, rendering that schema unreadable by 
generated code (although it would still be readable by "generic" code).  That 
would be a bug in that implementation.

Code generators already have to mangle names that are reserved words in the 
generated programming language.  If we permit non-ASCII characters in 
identifiers then implementations might also need to escape non-ASCII characters 
when generating code.  This doesn't seem a huge burden.

It's important that the specification is clear about what characters 
implementations might expect to see in identifiers so that they know what 
characters need to be escaped.  A conservative implementation might simply 
escape anything that's not permitted in their programming language.

If the spec is changed we should specify precisely what characters are 
permitted.  Unicode characters have properties.  We can use these properties to 
make the specification precise.  One property is 'letter', another is 'number'. 
 Java's isLetterOrDigit() includes these two sets.

Stepping back, it would be good if folks could use their own languages when 
writing Avro schemas.  It should be possible to use, e.g., column names that 
are in Japanese, Chinese, Hindi, etc.
                
> Error in validate name
> ----------------------
>
>                 Key: AVRO-1022
>                 URL: https://issues.apache.org/jira/browse/AVRO-1022
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Raymie Stata
>            Priority: Minor
>         Attachments: AVRO-1022.patch
>
>
> Fix schema.validateName to allow only ASCII letters, not Unicode letters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to