[
https://issues.apache.org/jira/browse/JENA-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Richard Cyganiak updated JENA-394:
----------------------------------
Attachment: japanese-chars.xml
A file with additional examples of this problem with other characters is
attached, all extracted from the Japanese DBpedia. Currently, working with the
Japanese DBpedia in Jena or Jena-based tools such as Pubby is practically
impossible due to this issue.
(Unfortunately, in the Turtle version of the same DBpedia data, the characters
are used in prefixed names, and this time Jena is correct in rejecting them
according to the Turtle spec, as they'd need to be escaped.)
> RDF/XML parser incorrectly disallows some Unicode characters
> ------------------------------------------------------------
>
> Key: JENA-394
> URL: https://issues.apache.org/jira/browse/JENA-394
> Project: Apache Jena
> Issue Type: Bug
> Components: RDF/XML
> Affects Versions: Jena 2.10.0
> Reporter: Richard Cyganiak
> Priority: Minor
> Attachments: japanese-chars.xml, katakana-middle-dot.xml
>
>
> The Unicode character 'KATAKANA MIDDLE DOT' (U+30FB) in the local part of a
> property name causes a parse exception in the RDF/XML parser. This seems to
> be incorrect, as the character is allowed in IRIs and is allowed in XML local
> names, as far as I can tell.
> Example file:
> <?xml version="1.0" encoding="utf-8" ?>
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> xmlns="http://example.com/ns#">
> <rdf:Description rdf:about="#this">
> <隣接自治体・行政区 rdf:resource="#that"/>
> </rdf:Description>
> </rdf:RDF>
> The offending character is the “dot” in the middle of the property name.
> rdfcat execution with stack trace:
> $ bin/rdfcat ~/katakana-middle-dot.xml
> 18:09:37 ERROR riot :: Element type "?????" must be followed
> by either attribute specifications, ">" or "/>".
> Exception in thread "main" org.apache.jena.riot.RiotException: Element type
> "?????" must be followed by either attribute specifications, ">" or "/>".
> at
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:132)
> at
> org.apache.jena.riot.lang.LangRDFXML$ErrorHandlerBridge.fatalError(LangRDFXML.java:242)
> at
> com.hp.hpl.jena.rdf.arp.impl.ARPSaxErrorHandler.fatalError(ARPSaxErrorHandler.java:48)
> at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:209)
> at
> com.hp.hpl.jena.rdf.arp.impl.XMLHandler.fatalError(XMLHandler.java:239)
> at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
> at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
> at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
> at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
> at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
> at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
> at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at
> com.hp.hpl.jena.rdf.arp.impl.RDFXMLParser.parse(RDFXMLParser.java:151)
> at com.hp.hpl.jena.rdf.arp.ARP.load(ARP.java:119)
> at org.apache.jena.riot.lang.LangRDFXML.parse(LangRDFXML.java:141)
> at
> org.apache.jena.riot.RDFParserRegistry$ReaderRIOTFactoryImpl$1.read(RDFParserRegistry.java:148)
> at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:749)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:258)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:244)
> at
> org.apache.jena.riot.adapters.RDFReaderRIOT.read(RDFReaderRIOT.java:65)
> at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:276)
> at
> com.hp.hpl.jena.util.FileManager.readModelWorker(FileManager.java:403)
> at com.hp.hpl.jena.util.FileManager.readModel(FileManager.java:342)
> at jena.rdfcat.readInput(rdfcat.java:375)
> at jena.rdfcat$ReadAction.run(rdfcat.java:552)
> at jena.rdfcat.go(rdfcat.java:278)
> at jena.rdfcat.main(rdfcat.java:260)
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira