[
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205192#comment-13205192
]
Thiruvalluvan M. G. commented on AVRO-1022:
-------------------------------------------
The basic trouble is that Unicode has multiple representations for the same
text. For example, see
http://weblogs.java.net/blog/joconner/archive/2006/06/strings_equals.html
Java has invested a lot of effort in supporting international characters. In
spite of that we have trouble. In many other languages it is worse.
Restricting to Unicode to letters and digits almost wipes out the use of
non-ASCII characters completely. In the example shown in the above article,
accents and accented characters are not recognized as letters. As another
example, in my native language Tamil, there are 247 "letters". Unicode tries to
model them not as individual alphabets but as symbols (about 60 of them). By
combining the symbols it makes up the alphabets. When you represent in Unicode
only about 30 or Tamil "letters" pass Java's isLetter() test. Almost no
meaningful Tamil word will pass the isLetter() test. It is true with (at least
some) other Indian Languages as well.
Moreover it is better to keep the spec more restrictive to start with and open
up later. I'm not sure of the current level of support for non-ascii Avro names
in the current implementations. It is not clear if the effort to make our
implementations conformant brings commensurate benefits, at least for now. For
example, in order to properly support this single feature, we may have to make
C++ implementation use a large library like ICU. A vast majority of Avro C++
users just don't need it.
> Error in validate name
> ----------------------
>
> Key: AVRO-1022
> URL: https://issues.apache.org/jira/browse/AVRO-1022
> Project: Avro
> Issue Type: Bug
> Components: java
> Reporter: Raymie Stata
> Priority: Minor
> Attachments: AVRO-1022.patch
>
>
> Fix schema.validateName to allow only ASCII letters, not Unicode letters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira