Stian Soiland-Reyes created JENA-1462:
-----------------------------------------

             Summary: RDF/XML parsing fails on newer/provisional/private URI 
schemes in base URI
                 Key: JENA-1462
                 URL: https://issues.apache.org/jira/browse/JENA-1462
             Project: Apache Jena
          Issue Type: Bug
          Components: ARQ, RDF/XML
    Affects Versions: Jena 3.6.0, Jena 3.5.0, Jena 3.4.0, Jena 3.3.0
         Environment: Apache Maven 3.3.9
Maven home: /usr/share/maven
Java version: 1.8.0_151, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "4.10.0-42-generic", arch: "amd64", family: "unix"

Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial


            Reporter: Stian Soiland-Reyes


RIOT parsing RDF/XML with a base URI different from http/https/file, such as 
ssh://, fails.

See https://github.com/stain/jena-test-unregistered-iana for some tests I came 
up with.

Tests fail both for xml:base or if the base URI is provided to RDFDataMgr, but 
not if the URI is full inside the RDF/XML.


{code}
org.apache.jena.riot.RiotException: [line: 5, col: 40] {E214} Resolving against 
bad URI <ssh://example.com/nested/>: <foo.txt>
        at 
org.apache.jena.riot.TestParseURISchemeBases.sshBaseRDF(TestParseURISchemeBases.java:336)
{code}

This error message comes from ERR_RESOLVING_AGAINST_MALFORMED_BASE - for some 
reason the warning becomes an error as the IRI Factory used for creating the 
Base IRI within the RDF/XML parser is a bit too strict.

However I could not find anything in the specs:

* https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/
* https://www.w3.org/TR/2009/REC-xmlbase-20090128/
* https://www.ietf.org/rfc/rfc3986

that says "foreign" URI schemes should not be permitted. Anyway Jena's IANA 
list is probably out of date, as my tests shown.

This was initially detected in TAVERNA-1027 which tries to parse an RDF/XML 
with the [app:// URI scheme|https://www.w3.org/TR/app-uri/] , which is *not* 
registered with IANA https://www.iana.org/assignments/uri-schemes according to 
https://tools.ietf.org/html/bcp35

However, testing Jena with other permanent and provisional schemes from the 
registry, such as example://, ssh:// or a conformant private scheme with a 
domain-based name org.apache.jena.test:// also give the same error.  

IMHO they should all be understood in the same way as when parsing the Turtle 
examples, which don't fail.

I could trace this back to Jena 3.3.0, so I suspect this was introduced with 
JENA-1306. With versions before that all my tests *) work.

I'll raise a pull request with the junit tests, but have not been able to find 
a good way to fix it.

_*) There's a separate issue that hostnames in file://example.com/etc/passwd 
style URIs also seem to be misparsed in RDF/XML into 
file:///example.com/etc/passwd , which I'll report separately, that goes back 
till 3.0.1._



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to