[ 
https://issues.apache.org/jira/browse/JENA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638256#comment-14638256
 ] 

Joachim Neubert edited comment on JENA-996 at 7/23/15 6:26 AM:
---------------------------------------------------------------

Here another four lines in the ntriples output of 
http://zbw.eu/beta/tmp/econis_0040.json :
{code}
<http://zbw.eu/beta/resource/title/10003935684> 
<http://purl.org/dc/terms/subject> <http://zbw.eu/stw/descriptor/17983-5> .

<http://zbw.eu/beta/resource/title/10003935684> 
<http://purl.org/dc/terms/title> "Herausforderungen an Compliance hinsichtlich 
der Korruptionsbekämpfung in der öffentli

chen Verwaltung" .

<http://zbw.eu/beta/resource/title/10003935685> 
<http://purl.org/dc/elements/1.1/creator> "Stegh, Michael" .
<http://zbw.eu/beta/re08:01:22 WARN  riot                 :: Bad IRI: 
<http://nbn-resolving.de/urn:nbn:de:bvb:19-epub-13249-6Tournaments; 
Heterogeneity; Incentives; Effort> Code: 17/WHITESPACE in PATH: A single 
whitespace character. These match no grammar rules of URIs/IRIs. These 
characters are permitted in RDF URI References, XML system identifiers, and XML 
Schema anyURIs.

<http://zbw.eu/beta/resource/title/10003935685> 
<http://purl.org/dc/elements/1.1/language> "eng" .
{code}
The invalid sequence in the source file is:
{code}
   "identifier_urn" : [
      "urnid:urn:nbn:de:bvb:19-epub-13249-6Tournaments; Heterogeneity; 
Incentives; Effort"
   ],
{code}



was (Author: jneubert):
Here another four lines in the ntriples output of 
http://zbw.eu/beta/tmp/econis_0040.json:
{code}
<http://zbw.eu/beta/resource/title/10003935684> 
<http://purl.org/dc/terms/subject> <http://zbw.eu/stw/descriptor/17983-5> .

<http://zbw.eu/beta/resource/title/10003935684> 
<http://purl.org/dc/terms/title> "Herausforderungen an Compliance hinsichtlich 
der Korruptionsbekämpfung in der öffentli

chen Verwaltung" .

<http://zbw.eu/beta/resource/title/10003935685> 
<http://purl.org/dc/elements/1.1/creator> "Stegh, Michael" .
<http://zbw.eu/beta/re08:01:22 WARN  riot                 :: Bad IRI: 
<http://nbn-resolving.de/urn:nbn:de:bvb:19-epub-13249-6Tournaments; 
Heterogeneity; Incentives; Effort> Code: 17/WHITESPACE in PATH: A single 
whitespace character. These match no grammar rules of URIs/IRIs. These 
characters are permitted in RDF URI References, XML system identifiers, and XML 
Schema anyURIs.

<http://zbw.eu/beta/resource/title/10003935685> 
<http://purl.org/dc/elements/1.1/language> "eng" .
{code}
The invalid sequence in the source file is:
{code}
   "identifier_urn" : [
      "urnid:urn:nbn:de:bvb:19-epub-13249-6Tournaments; Heterogeneity; 
Incentives; Effort"
   ],
{code}


> riot should recognize invalid URIs in large jsonld files
> --------------------------------------------------------
>
>                 Key: JENA-996
>                 URL: https://issues.apache.org/jira/browse/JENA-996
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: RIOT
>    Affects Versions: Jena 2.13.0
>            Reporter: Joachim Neubert
>         Attachments: defect_doi.jsonld
>
>
> With riot --validate, in large jsonld files URIs including whitespace are not 
> flagged, the files seems to be valid.
> However, when loading this with Fusekis tdbloader, loading aborts with
> ERROR [line: 31572453, col: 1 ] Broken token (newline): Public consulting 
> services for smaller comme03:06:01 WARN  riot                 :: Bad IRI: 
> <http://dx.doi.org/DOI 10.2767/59617> Code: 17/WHITESPACE in PATH: A single 
> whitespace character. These match no grammar rules of URIs/IRIs. These 
> characters are permitted in RDF URI References, XML system identifiers, and 
> XML Schema anyURIs.
> Unfortunately, it seems that the error can not be reproduced on small files. 
> The isolated sequence:
> {code}
> {
>   "@context":
>   {
>     "eb": "http://zbw.eu/beta/resource/title/";,
>     "doi": "http://dx.doi.org/";,
>     "identifier_doi": { "@id": "umbel:isLike", "@type": "@id" }
>   },
>   "@graph":
> [
> {
>    "@id" : "eb:10003656538",
>    "identifier_doi" : [
>       "doi:DOI 10.2767/59617"
>    ]
> }
> ]
> }
> {code}
> creates a message:
> 14:13:52 WARN  riot                 :: Bad IRI: <http://dx.doi.org/DOI 
> 10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character. 
> These match no grammar rules of URIs/IRIs. These characters are permitted in 
> RDF URI References, XML system identifiers, and XML Schema anyURIs.
> I can make the large file (econis_0037.json, 59Mb) available for download.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to