[
https://issues.apache.org/jira/browse/JENA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638184#comment-14638184
]
Joachim Neubert commented on JENA-996:
--------------------------------------
You are right, I piped to stdout, and in that course concatenated multiple
jsonld files to one large ntriples file. (Sorry for the incomplete initial
description. Due to disk space limitations, I deleted the defect intermediate
file.)
I could reproduce the issue with
{code}
JVM_ARGS="-Xmx4G" /opt/jena/bin/riot --syntax=jsonld --strict --check
--output=ntriples econis_0037.json >econis_0037.nt
{code}
No warning is given on stderr. The ntriples file contains the following four
lines (for clarity, I've inserted additional newlines):.
{code}
<http://zbw.eu/beta/resource/title/10003656533>
<http://purl.org/dc/terms/source> <http://zbw.eu/beta/ebds/source/econis> .
<http://zbw.eu/beta/resource/title/10003656533>
<http://purl.org/dc/terms/title> "Public consulting services for smaller
comme06:41:04 WARN riot :: Bad IRI: <http://dx.doi.org/DOI
10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character.
These match no grammar rules of URIs/IRIs. These characters are permitted in
RDF URI References, XML system identifiers, and XML Schema anyURIs.
rcial enterprises of Japan" .
<http://zbw.eu/beta/resource/title/10003656533> <http://umbel.org/umbel#isLike>
<http://www.worldcat.org/oclc/254377811> .
{code}
Up to the first cited line and from the forth, the riot output is correct, the
second and third are messed up. So tdbloader got already defect data.
I attach a file with the invalid jsonld "record". Yet, this causes a correct
warning with "riot --validate". The intermingled output only occurs with the
large file, which can be downloaded at http://zbw.eu/beta/tmp/econis_0037.json.
> riot should recognize invalid URIs in large jsonld files
> --------------------------------------------------------
>
> Key: JENA-996
> URL: https://issues.apache.org/jira/browse/JENA-996
> Project: Apache Jena
> Issue Type: Bug
> Components: RIOT
> Affects Versions: Jena 2.13.0
> Reporter: Joachim Neubert
>
> With riot --validate, in large jsonld files URIs including whitespace are not
> flagged, the files seems to be valid.
> However, when loading this with Fusekis tdbloader, loading aborts with
> ERROR [line: 31572453, col: 1 ] Broken token (newline): Public consulting
> services for smaller comme03:06:01 WARN riot :: Bad IRI:
> <http://dx.doi.org/DOI 10.2767/59617> Code: 17/WHITESPACE in PATH: A single
> whitespace character. These match no grammar rules of URIs/IRIs. These
> characters are permitted in RDF URI References, XML system identifiers, and
> XML Schema anyURIs.
> Unfortunately, it seems that the error can not be reproduced on small files.
> The isolated sequence:
> {code}
> {
> "@context":
> {
> "eb": "http://zbw.eu/beta/resource/title/",
> "doi": "http://dx.doi.org/",
> "identifier_doi": { "@id": "umbel:isLike", "@type": "@id" }
> },
> "@graph":
> [
> {
> "@id" : "eb:10003656538",
> "identifier_doi" : [
> "doi:DOI 10.2767/59617"
> ]
> }
> ]
> }
> {code}
> creates a message:
> 14:13:52 WARN riot :: Bad IRI: <http://dx.doi.org/DOI
> 10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character.
> These match no grammar rules of URIs/IRIs. These characters are permitted in
> RDF URI References, XML system identifiers, and XML Schema anyURIs.
> I can make the large file (econis_0037.json, 59Mb) available for download.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)