[
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204844#comment-13204844
]
Doug Cutting commented on AVRO-1022:
------------------------------------
> And doing Unicode right is a lot of work; doing it poorly will just create a
> nasty source of interop problems.
I don't see this. Avro already requires that JSON parsers "do Unicode right".
Permitting non-ASCII in identifiers only creates problems when generating code.
The potential interoperability problem could be that some implementations,
when given a schema, would be unable to generate valid code in their
programming language for that schema, rendering that schema unreadable by
generated code (although it would still be readable by "generic" code). That
would be a bug in that implementation.
Code generators already have to mangle names that are reserved words in the
generated programming language. If we permit non-ASCII characters in
identifiers then implementations might also need to escape non-ASCII characters
when generating code. This doesn't seem a huge burden.
It's important that the specification is clear about what characters
implementations might expect to see in identifiers so that they know what
characters need to be escaped. A conservative implementation might simply
escape anything that's not permitted in their programming language.
If the spec is changed we should specify precisely what characters are
permitted. Unicode characters have properties. We can use these properties to
make the specification precise. One property is 'letter', another is 'number'.
Java's isLetterOrDigit() includes these two sets.
Stepping back, it would be good if folks could use their own languages when
writing Avro schemas. It should be possible to use, e.g., column names that
are in Japanese, Chinese, Hindi, etc.
> Error in validate name
> ----------------------
>
> Key: AVRO-1022
> URL: https://issues.apache.org/jira/browse/AVRO-1022
> Project: Avro
> Issue Type: Bug
> Components: java
> Reporter: Raymie Stata
> Priority: Minor
> Attachments: AVRO-1022.patch
>
>
> Fix schema.validateName to allow only ASCII letters, not Unicode letters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira