[
https://issues.apache.org/jira/browse/JENA-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054362#comment-15054362
]
Andy Seaborne commented on JENA-1071:
-------------------------------------
The same operation `ParserSupport.checkXMLName` is used to check `rdf:ID` and
`rdf:nodeID`. Only the `rdf:ID` check should change.
> Warnings when using XML 1.0 5th edition codepoints in rdf:ID.
> -------------------------------------------------------------
>
> Key: JENA-1071
> URL: https://issues.apache.org/jira/browse/JENA-1071
> Project: Apache Jena
> Issue Type: Bug
> Reporter: Andy Seaborne
> Priority: Minor
>
> Report on users@ https://pony-poc.apache.org/thread.html/Znx1topkrk8ykbr
> Workaround:
> * Use {{rdf:about}}
> * Ignore or disable the warning
> The causing character is [Character
> U+0370|http://www.fileformat.info/info/unicode/char/0370/index.htm] (Greek
> Capital Heta). It was added to unicode at version 5.1.
> https://en.wikipedia.org/wiki/Heta
> Greek letters e.g. ΑΒ..., (Capital Letters alpha and beta) U+0391 and αβ...
> (lower case).
> Jena code {{ParserSupport.checkXMLName}} calls Xerces
> {{XMLChar.isValidNCName}}.
> Xerces supports "XML 1.0 Fourth Edition" which does not permit U+0370.
> Java8 also only supports XML 1.0 fourth edition.
> Both Xerces 2.11.0 (Jena since 2.10.1) and Java8 support XML 1.1 with
> {{XML11Char.isXML11ValidNCName}} which does include this character.
> Jena could "upgrade" to using the XML11Char for the additional checks it
> performs. This is not XML11 support.
> Uses of {{XMLChar}}:
> * {{BaseXMLWriter.java}}
> * {{ParserSupport.java}}
> * {{Unparser.java}}
> * {{PrefixMappingImpl}} -- URI splitting?
> * {{Util}} - controls URI splitting
> * {{schemagen}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)