Osma Suominen created JENA-1826:
-----------------------------------

             Summary: Fuseki RDF/XML response never finishes
                 Key: JENA-1826
                 URL: https://issues.apache.org/jira/browse/JENA-1826
             Project: Apache Jena
          Issue Type: Bug
          Components: Fuseki
    Affects Versions: Jena 3.14.0
         Environment: Ubuntu 16.04
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

            Reporter: Osma Suominen
         Attachments: W00067442800.ttl

I have a web app running SPARQL CONSTRUCT queries against Fuseki and generating 
web pages. I noticed that Fuseki started hogging all CPU cores a few hours 
after it was restarted. It turned out that some of the CONSTRUCT queries take a 
very long time to complete - at least 40 minutes but probably more and it seems 
quite likely they will never finish.

I was able to turn this into a fairly minimal example. I've attached a 1.3MB 
Turtle file (~29k triples) with all the data necessary to demonstrate the 
problem. 

Start Fuseki like this: {{./fuseki-server --file W00067442800.ttl /ds}}

Then open the Fuseki web UI and run this SPARQL query against the dataset:

{noformat}
PREFIX schema: <http://schema.org/>       
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>                   
CONSTRUCT {
  <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o .
  ?o schema:name ?oname ;
    skos:prefLabel ?olabel .
  ?inst ?instprop ?instval .
  ?instval schema:name ?instvalName ;
    skos:prefLabel ?instvalLabel .
}
WHERE {
  {
    <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o .
    OPTIONAL {
      {
        ?o schema:name ?oname 
      }             UNION             {
        ?o skos:prefLabel ?olabel 
      }           
    }         
  }         UNION         {
    {
      <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> schema:workExample ?inst 
    }           OPTIONAL {
      {
        ?inst ?instprop ?instval .
        OPTIONAL {
          {
            ?instval schema:name ?instvalName 
          }                 UNION                 {
            ?instval skos:prefLabel ?instvalLabel 
          }               
        }             
      }      
    }         
  }       
}
{noformat}

If you select Turtle as the content type, the query will finish in around 3 
seconds (plus rendering the result in the browser takes a while). If instead 
you select XML as the format, the query will just keep running, with Fuseki 
taking over a single CPU core completely. With several such queries running, 
all the CPU cores will eventually be used.

This can also be demonstrated using curl (with the above query saved as 
{{query.rq}}):

{noformat}
curl -H 'Accept: text/turtle' --data-urlencode "[email protected]" 
http://localhost:3030/ds/sparql
{noformat}

works fine and gives you the Turtle output;

{noformat}
curl -H 'Accept: application/rdf+xml' --data-urlencode "[email protected]" 
http://localhost:3030/ds/sparql
{noformat}

never seems to finish.

What's perhaps even worse, even a query timeout setting doesn't help. If I 
start Fuseki with a 10 second query timeout, i.e. {{--timeout 10000}}, it still 
won't stop the query from hogging the CPU forever. I'm guessing that the 
problem is in the final stages of the query processing, when the results just 
have to be serialized into the correct syntax, and the timeout is no longer 
applied in this stage.

I discovered this problem while running Fuseki 3.5.0, but it happens with the 
most recent release 3.14.0 as well.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to