[ 
https://issues.apache.org/jira/browse/JENA-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054362#comment-15054362
 ] 

Andy Seaborne commented on JENA-1071:
-------------------------------------

The same operation `ParserSupport.checkXMLName` is used to check `rdf:ID` and 
`rdf:nodeID`.  Only the `rdf:ID` check should change.

> Warnings when using XML 1.0 5th edition codepoints in rdf:ID.
> -------------------------------------------------------------
>
>                 Key: JENA-1071
>                 URL: https://issues.apache.org/jira/browse/JENA-1071
>             Project: Apache Jena
>          Issue Type: Bug
>            Reporter: Andy Seaborne
>            Priority: Minor
>
> Report on users@ https://pony-poc.apache.org/thread.html/Znx1topkrk8ykbr 
> Workaround: 
> * Use {{rdf:about}}
> * Ignore or disable the warning
> The causing character is [Character 
> U+0370|http://www.fileformat.info/info/unicode/char/0370/index.htm] (Greek 
> Capital Heta). It was added to unicode at version 5.1. 
> https://en.wikipedia.org/wiki/Heta
> Greek letters e.g.  ΑΒ..., (Capital Letters alpha and beta) U+0391 and αβ...  
> (lower case).
> Jena code {{ParserSupport.checkXMLName}} calls Xerces 
> {{XMLChar.isValidNCName}}.
> Xerces supports "XML 1.0 Fourth Edition" which does not permit U+0370.
> Java8 also only supports XML 1.0 fourth edition.
> Both Xerces 2.11.0 (Jena since 2.10.1) and Java8 support XML 1.1 with 
> {{XML11Char.isXML11ValidNCName}} which does include this character.
> Jena could "upgrade" to using the XML11Char for the additional checks it 
> performs.  This is not XML11 support.
> Uses of {{XMLChar}}:
> * {{BaseXMLWriter.java}}
> * {{ParserSupport.java}}
> * {{Unparser.java}}
> * {{PrefixMappingImpl}} -- URI splitting?
> * {{Util}} - controls URI splitting
> * {{schemagen}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to