Joachim Neubert created JENA-996:
------------------------------------

             Summary: riot should recognize invalid URIs in large jsonld files
                 Key: JENA-996
                 URL: https://issues.apache.org/jira/browse/JENA-996
             Project: Apache Jena
          Issue Type: Bug
          Components: RIOT
    Affects Versions: Jena 2.13.0
            Reporter: Joachim Neubert


With riot --validate, in large jsonld files URIs including whitespace are not 
flagged, the files seems to be valid.

However, when loading this with Fusekis tdbloader, loading aborts with

ERROR [line: 31572453, col: 1 ] Broken token (newline): Public consulting 
services for smaller comme03:06:01 WARN  riot                 :: Bad IRI: 
<http://dx.doi.org/DOI 10.2767/59617> Code: 17/WHITESPACE in PATH: A single 
whitespace character. These match no grammar rules of URIs/IRIs. These 
characters are permitted in RDF URI References, XML system identifiers, and XML 
Schema anyURIs.

Unfortunately, it seems that the error can not be reproduced on small files. 
The isolated sequence:
{code}
{
  "@context":
  {
    "eb": "http://zbw.eu/beta/resource/title/";,
    "doi": "http://dx.doi.org/";,
    "identifier_doi": { "@id": "umbel:isLike", "@type": "@id" }
  },
  "@graph":
[
{
   "@id" : "eb:10003656538",
   "identifier_doi" : [
      "doi:DOI 10.2767/59617"
   ]
}
]
}
{code}

creates a message:

14:13:52 WARN  riot                 :: Bad IRI: <http://dx.doi.org/DOI 
10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character. 
These match no grammar rules of URIs/IRIs. These characters are permitted in 
RDF URI References, XML system identifiers, and XML Schema anyURIs.

I can make the large file (econis_0037.json, 59Mb) available for download.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to