[
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205616#comment-13205616
]
Doug Cutting commented on AVRO-1022:
------------------------------------
An implementation would be naive to trust that other implementations have
validated all names in schemas it receives. Java currently disables validation
when reading a schema from a data file, since it's more important to be able to
read the data. With Generic APIs name validation isn't required and many
applications use only generic APIs.
This would not require support for unicode identifiers in programming
languages. A code generator should escape any character in a name that's not
easy for it to represent in an identifier. We'd just be permitting code
generators to take advantage of when a programming language does support
Unicode in identifiers.
> If we went the other way (chance the spec), we'd have to answer a bunch of
> design questions
> (decide what is a "letter," decide on normalization, figure out how to mangle
> names in various
> languages, etc.), and then implement validation in each language [ ... ]
I disagree. Even if we removed all restrictions on naming I don't think we'd
add much burden to implementations. Most implementations don't do code
generation. Code generators already need to mangle names. A code generator
should already escape rather than die when it sees an unexpected character in a
name. (The alternative is an inability to generate code for schemas that
someone else controls, a poor choice.)
So I don't see a new interoperability problem this would create. We already
have schemas in the wild whose names are invalid.
Perhaps we should change the spec to recommend that names be restricted to
ASCII for ease of programming with generated APIs in all languages. And we
might check that in compiler, forcing folks to specify --escape-non-ASCII-names
if they really want to generate code for a schema whose names contain non-ASCII
characters, to discourage the use of non-ASCII in schemas that you do control.
In general we could encourage implementations to both not trust that
identifiers are all-ASCII and to try to encourage all-ASCII identifiers.
> Error in validate name
> ----------------------
>
> Key: AVRO-1022
> URL: https://issues.apache.org/jira/browse/AVRO-1022
> Project: Avro
> Issue Type: Bug
> Components: java
> Reporter: Raymie Stata
> Priority: Minor
> Attachments: AVRO-1022.patch
>
>
> Fix schema.validateName to allow only ASCII letters, not Unicode letters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira