Julian Gonggrijp created JENA-2061:
--------------------------------------
Summary: Fuseki XML result serializer outputs characters that are
illegal per XML spec
Key: JENA-2061
URL: https://issues.apache.org/jira/browse/JENA-2061
Project: Apache Jena
Issue Type: Bug
Components: Fuseki
Affects Versions: Jena 3.15.0
Environment: We confirmed the above behavior in three environments:
* CentOS 8 with OpenJDK 1.8.0_282
* macOS 10.15 with OpenJDK 13.0.2
* macOS 10.14 with Java 8 JDK
Reporter: Julian Gonggrijp
Due to a mistake at our end, our application inserted a literal into the triple
store that included ASCII character {{0x001B}} (below represented as {{ESC}}):
{code:none}
PREFIX oa: <http://www.w3.org/ns/oa#>
PREFIX our: <http://example.org/>
INSERT DATA {
our:example oa:exact "foo ESC bar" .
}
{code}
While this was unintentional and I can't really think of a situation where
inserting control characters is desirable, this is nevertheless allowed by the
SPARQL and Turtle specifications. I think. Please correct me if I'm wrong.
Regardless, Fuseki accepts this update request.
When we subsequently retrieve the data through a {{SELECT}} query with the
{{ACCEPT}} header set to {{application/sparql-results+xml}}, the XML includes
this {{ESC}} character again:
{code:none}
SELECT ?c WHERE { ?a ?b ?c . }
{code}
{code:xml}
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="c"/>
</head>
<results>
<result>
<binding name="c">
<literal>foo ESC bar</literal>
</binding>
</result>
</results>
</sparql>
{code}
This leads to errors when the result XML is parsed downstream.
If we do a {{CONSTRUCT}} with {{application/rdf+xml}}, the Fuseki server
returns a {{400 Bad Request}} instead, which I have double-checked is due to
the presence of the {{ESC}} character.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)