[ 
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205192#comment-13205192
 ] 

Thiruvalluvan M. G. commented on AVRO-1022:
-------------------------------------------

The basic trouble is that Unicode has multiple representations for the same 
text. For example, see
http://weblogs.java.net/blog/joconner/archive/2006/06/strings_equals.html

Java has invested a lot of effort in supporting international characters. In 
spite of that we have trouble. In many other languages it is worse.

Restricting to Unicode to letters and digits almost wipes out the use of 
non-ASCII characters completely. In the example shown in the above article, 
accents and accented characters are not recognized as letters. As another 
example, in my native language Tamil, there are 247 "letters". Unicode tries to 
model them not as individual alphabets but as symbols (about 60 of them). By 
combining the symbols it makes up the alphabets. When you represent in Unicode 
only about 30 or Tamil "letters" pass Java's isLetter() test. Almost no 
meaningful Tamil word will pass the isLetter() test. It is true with (at least 
some) other Indian Languages as well.

Moreover it is better to keep the spec more restrictive to start with and open 
up later. I'm not sure of the current level of support for non-ascii Avro names 
in the current implementations. It is not clear if the effort to make our 
implementations conformant brings commensurate benefits, at least for now. For 
example, in order to properly support this single feature, we may have to make 
C++ implementation use a large library like ICU. A vast majority of Avro C++ 
users just don't need it.


                
> Error in validate name
> ----------------------
>
>                 Key: AVRO-1022
>                 URL: https://issues.apache.org/jira/browse/AVRO-1022
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Raymie Stata
>            Priority: Minor
>         Attachments: AVRO-1022.patch
>
>
> Fix schema.validateName to allow only ASCII letters, not Unicode letters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to