Osma Suominen created JENA-1826:
-----------------------------------
Summary: Fuseki RDF/XML response never finishes
Key: JENA-1826
URL: https://issues.apache.org/jira/browse/JENA-1826
Project: Apache Jena
Issue Type: Bug
Components: Fuseki
Affects Versions: Jena 3.14.0
Environment: Ubuntu 16.04
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
Reporter: Osma Suominen
Attachments: W00067442800.ttl
I have a web app running SPARQL CONSTRUCT queries against Fuseki and generating
web pages. I noticed that Fuseki started hogging all CPU cores a few hours
after it was restarted. It turned out that some of the CONSTRUCT queries take a
very long time to complete - at least 40 minutes but probably more and it seems
quite likely they will never finish.
I was able to turn this into a fairly minimal example. I've attached a 1.3MB
Turtle file (~29k triples) with all the data necessary to demonstrate the
problem.
Start Fuseki like this: {{./fuseki-server --file W00067442800.ttl /ds}}
Then open the Fuseki web UI and run this SPARQL query against the dataset:
{noformat}
PREFIX schema: <http://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
CONSTRUCT {
<http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o .
?o schema:name ?oname ;
skos:prefLabel ?olabel .
?inst ?instprop ?instval .
?instval schema:name ?instvalName ;
skos:prefLabel ?instvalLabel .
}
WHERE {
{
<http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o .
OPTIONAL {
{
?o schema:name ?oname
} UNION {
?o skos:prefLabel ?olabel
}
}
} UNION {
{
<http://urn.fi/URN:NBN:fi:bib:me:W00067442800> schema:workExample ?inst
} OPTIONAL {
{
?inst ?instprop ?instval .
OPTIONAL {
{
?instval schema:name ?instvalName
} UNION {
?instval skos:prefLabel ?instvalLabel
}
}
}
}
}
}
{noformat}
If you select Turtle as the content type, the query will finish in around 3
seconds (plus rendering the result in the browser takes a while). If instead
you select XML as the format, the query will just keep running, with Fuseki
taking over a single CPU core completely. With several such queries running,
all the CPU cores will eventually be used.
This can also be demonstrated using curl (with the above query saved as
{{query.rq}}):
{noformat}
curl -H 'Accept: text/turtle' --data-urlencode "[email protected]"
http://localhost:3030/ds/sparql
{noformat}
works fine and gives you the Turtle output;
{noformat}
curl -H 'Accept: application/rdf+xml' --data-urlencode "[email protected]"
http://localhost:3030/ds/sparql
{noformat}
never seems to finish.
What's perhaps even worse, even a query timeout setting doesn't help. If I
start Fuseki with a 10 second query timeout, i.e. {{--timeout 10000}}, it still
won't stop the query from hogging the CPU forever. I'm guessing that the
problem is in the final stages of the query processing, when the results just
have to be serialized into the correct syntax, and the timeout is no longer
applied in this stage.
I discovered this problem while running Fuseki 3.5.0, but it happens with the
most recent release 3.14.0 as well.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)