[
https://issues.apache.org/jira/browse/JENA-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712780#comment-17712780
]
Jan Martin Keil commented on JENA-2351:
---------------------------------------
The IRIs with newline
# origin (to my knowledge) from a [bug in the DBpedia extraction
framework|https://github.com/dbpedia/extraction-framework/issues/748],
# are [shipped by Virtuoso SPARQL endpoint with newline encoded as
{{\u000A}}|https://dbpedia.org/sparql/?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=DESCRIBE+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FNeutron_Star_Interior_Composition_Explorer%3E&format=text%2Fturtle],
# fetched into Jena with
{{QueryExecution.service(...)..query(..).build().execConstruct(Model)}} without
further notice,
# internally represented with newline as {{{}\n{}}},
# serialized as {{{}\n{}}}, and
# but then refused by {{Model#read}} to read.
That way, newline is handled incoherent in Jena. My
[workaround|https://github.com/fusion-jena/abecto/commit/19392f51d5bd467776f1167ac818a033ba630d6d]
is to log and skip statements with newline in IRIs after fetching them.
No question, newlines should not exist in IRIs. But escaped with UCHAR, they at
least meet the Turtle grammar rule. And Jena should handle it in a coherent
way. That way, escaping newline with \u000A and log a warning might be the best
option. Reading them from file works and triggers a warning.
{code:java}
@Test
public void ttl() throws IOException {
File file = File.createTempFile("example",".ttl");
new
FileOutputStream(file).write("<http://dbpedia.org/resource/Lunar_Reconnaissance_Orbiter>
<http://xmlns.com/foaf/0.1/depiction>
<http://commons.wikimedia.org/wiki/Special:FilePath/\\u000ALRO_WAC_North_Pole_Mosaic_(PIA14024).jpg>
.".getBytes(StandardCharsets.UTF_8));
System.out.println(ModelFactory.createDefaultModel().read(new
FileInputStream(file),"","TTL"));
}
{code}
Output:
{code:java}
Apr. 16, 2023 3:02:09 PM
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logWarning
WARNUNG: [line: 1, col: 98] Bad IRI:
<http://commons.wikimedia.org/wiki/Special:FilePath/
LRO_WAC_North_Pole_Mosaic_(PIA14024).jpg> Code: 5/CONTROL_CHARACTER in PATH:
Control characters are not allowed in URIs or RDF URI References.
<ModelCom {http://dbpedia.org/resource/Lunar_Reconnaissance_Orbiter
@http://xmlns.com/foaf/0.1/depiction
http://commons.wikimedia.org/wiki/Special:FilePath/
LRO_WAC_North_Pole_Mosaic_(PIA14024).jpg} |
[http://dbpedia.org/resource/Lunar_Reconnaissance_Orbiter,
http://xmlns.com/foaf/0.1/depiction,
http://commons.wikimedia.org/wiki/Special:FilePath/
LRO_WAC_North_Pole_Mosaic_(PIA14024).jpg]>
{code}
> Newline (U+000A) in IRIs not escaped during NT/TTL/NQ/TRIG serialization
> -------------------------------------------------------------------------
>
> Key: JENA-2351
> URL: https://issues.apache.org/jira/browse/JENA-2351
> Project: Apache Jena
> Issue Type: Bug
> Components: RIOT
> Affects Versions: Jena 4.7.0
> Reporter: Jan Martin Keil
> Priority: Major
>
> [Newline characters (U+000A) in
> IRIs|https://github.com/dbpedia/extraction-framework/issues/748] are not
> escaped during the serialization of a model or datasets into a format of the
> turtle family. This results in invalid files, which Jena is not able to read
> anymore. Please not the following tests:
> {code:java}
> import org.apache.jena.query.Dataset;
> import org.apache.jena.query.DatasetFactory;
> import org.apache.jena.rdf.model.*;
> import org.apache.jena.riot.Lang;
> import org.apache.jena.riot.RDFDataMgr;
> import org.junit.jupiter.api.Test;
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.IOException;
> public class Example {
> @Test
> public void rdfXml() throws IOException {
> Property someProperty =
> ResourceFactory.createProperty("http://example.org/property");
> Model model = ModelFactory.createDefaultModel();
>
> model.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a
> string");
> System.out.println("\nRDF/XML:\n");
> model.write(System.out,"RDF/XML");
> // test write and read
> File file = File.createTempFile("example",".rdf");
> model.write(new FileOutputStream(file),"RDF/XML");
> ModelFactory.createDefaultModel().read(new
> FileInputStream(file),"","RDF/XML");
> }
> @Test
> public void ttl() throws IOException {
> Property someProperty =
> ResourceFactory.createProperty("http://example.org/property");
> Model model = ModelFactory.createDefaultModel();
>
> model.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a
> string");
> System.out.println("\nTTL:\n");
> model.write(System.out,"TTL");
> // test write and read
> File file = File.createTempFile("example",".ttl");
> model.write(new FileOutputStream(file),"TTL");
> ModelFactory.createDefaultModel().read(new
> FileInputStream(file),"","TTL");
> }
> @Test
> public void nTriples() throws IOException {
> Property someProperty =
> ResourceFactory.createProperty("http://example.org/property");
> Model model = ModelFactory.createDefaultModel();
>
> model.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a
> string");
> System.out.println("\nN-TRIPLE:\n");
> model.write(System.out,"N-TRIPLE");
> // test write and read
> File file = File.createTempFile("example",".nt");
> model.write(new FileOutputStream(file),"N-TRIPLE");
> ModelFactory.createDefaultModel().read(new
> FileInputStream(file),"","N-TRIPLE");
> }
> @Test
> public void nq() throws IOException {
> Property someProperty =
> ResourceFactory.createProperty("http://example.org/property");
> Model model1 = ModelFactory.createDefaultModel();
>
> model1.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a
> string");
> Model model2 = ModelFactory.createDefaultModel();
>
> model2.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a
> string");
> Dataset dataset = DatasetFactory.createGeneral();
> dataset.setDefaultModel(model1);
> dataset.addNamedModel("http://example.org/namedGraph",model2);
> System.out.println("\nNQ:\n");
> RDFDataMgr.write(System.out, dataset, Lang.NQ) ;
> // test write and read
> File file = File.createTempFile("example", ".nq");
> RDFDataMgr.write(new FileOutputStream(file), dataset, Lang.NQ) ;
> RDFDataMgr.read(DatasetFactory.createGeneral(), new
> FileInputStream(file), Lang.NQ) ;
> }
> @Test
> public void trig() throws IOException {
> Property someProperty =
> ResourceFactory.createProperty("http://example.org/property");
> Model model1 = ModelFactory.createDefaultModel();
>
> model1.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a
> string");
> Model model2 = ModelFactory.createDefaultModel();
>
> model2.createResource("http://example.org/aaa/\nbbb").addProperty(someProperty,"a
> string");
> Dataset dataset = DatasetFactory.createGeneral();
> dataset.setDefaultModel(model1);
> dataset.addNamedModel("http://example.org/namedGraph",model2);
> System.out.println("\nTRIG:\n");
> RDFDataMgr.write(System.out, dataset, Lang.TRIG) ;
> // test write and read
> File file = File.createTempFile("example", ".trig");
> RDFDataMgr.write(new FileOutputStream(file), dataset, Lang.TRIG) ;
> RDFDataMgr.read(DatasetFactory.createGeneral(), new
> FileInputStream(file), Lang.TRIG) ;
> }
> }
> {code}
> Outputs (stack traces truncated):
> {code:java}
> N-TRIPLE:
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> Apr. 15, 2023 10:01:45 PM
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline):
> http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline):
> http://example.org/aaa/
> at
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> ...
> {code}
> {code:java}
> RDF/XML:
> <rdf:RDF
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> xmlns:j.0="http://example.org/">
> <rdf:Description rdf:about="http://example.org/aaa/
bbb">
> <j.0:property>a string</j.0:property>
> </rdf:Description>
> </rdf:RDF>
> {code}
> {code:java}
> NQ:
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" <http://example.org/namedGraph>
> .
> Apr. 15, 2023 10:01:45 PM
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline):
> http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline):
> http://example.org/aaa/
> at
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> ...
> {code}
> {code:java}
> TTL:
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> Apr. 15, 2023 10:01:45 PM
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline):
> http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline):
> http://example.org/aaa/
> at
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> ...
> {code}
> {code:java}
> TRIG:
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> <http://example.org/namedGraph> {
> <http://example.org/aaa/
> bbb> <http://example.org/property> "a string" .
> }
> Apr. 15, 2023 10:01:45 PM
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
> SCHWERWIEGEND: [line: 2, col: 1 ] Broken IRI (newline):
> http://example.org/aaa/
> org.apache.jena.riot.RiotException: [line: 2, col: 1 ] Broken IRI (newline):
> http://example.org/aaa/
> at
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> ...
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]