[
https://issues.apache.org/jira/browse/JENA-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305114#comment-17305114
]
Andy Seaborne commented on JENA-2061:
-------------------------------------
This was addressed during work on JENA-2065.
The code now outputs the bad character as {{&xNNNN;}} which is still illegal
XML 1.0.
However, the alternatives are quite bad - no streaming, and risk out-of-memory
in the server; disk I/O "just in case"; or broken XML results.
> Fuseki XML result serializer outputs characters that are illegal per XML spec
> -----------------------------------------------------------------------------
>
> Key: JENA-2061
> URL: https://issues.apache.org/jira/browse/JENA-2061
> Project: Apache Jena
> Issue Type: Bug
> Components: Fuseki
> Affects Versions: Jena 3.15.0
> Environment: We confirmed the reported behavior in three environments:
> * CentOS 8 with OpenJDK 1.8.0_282
> * macOS 10.15 with OpenJDK 13.0.2
> * macOS 10.14 with Java 8 JDK
> Reporter: Julian Gonggrijp
> Priority: Major
>
> Due to a mistake at our end, our application inserted a literal into the
> triple store that included ASCII character {{0x001B}} (below represented as
> {{ESC}}):
> {code:none}
> PREFIX oa: <http://www.w3.org/ns/oa#>
> PREFIX our: <http://example.org/>
> INSERT DATA {
> our:example oa:exact "foo ESC bar" .
> }
> {code}
> While this was unintentional and I can't really think of a situation where
> inserting control characters is desirable, this is nevertheless allowed by
> the SPARQL and Turtle specifications. I think. Please correct me if I'm
> wrong. Regardless, Fuseki accepts this update request.
> When we subsequently retrieve the data through a {{SELECT}} query with the
> {{ACCEPT}} header set to {{application/sparql-results+xml}}, the XML includes
> this {{ESC}} character again:
> {code:none}
> SELECT ?c WHERE { ?a ?b ?c . }
> {code}
> {code:xml}
> <?xml version="1.0"?>
> <sparql xmlns="http://www.w3.org/2005/sparql-results#">
> <head>
> <variable name="c"/>
> </head>
> <results>
> <result>
> <binding name="c">
> <literal>foo ESC bar</literal>
> </binding>
> </result>
> </results>
> </sparql>
> {code}
> This leads to errors when the result XML is parsed downstream.
> If we do a {{CONSTRUCT}} with {{application/rdf+xml}}, the Fuseki server
> returns a {{400 Bad Request}} instead, which I have double-checked is due to
> the presence of the {{ESC}} character.
> *Edit to add:* the set of valid characters per the XML spec is defined
> [here|https://www.w3.org/TR/REC-xml/#charsets].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)