[
https://issues.apache.org/jira/browse/JENA-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022171#comment-17022171
]
Osma Suominen commented on JENA-1826:
-------------------------------------
Thanks Andy for confirming using riot. I also tested again with Fuseki and
data.nt.
{noformat}
[2020-01-23 16:39:57] Fuseki INFO [6] POST http://localhost:3030/ds/sparql
[2020-01-23 16:39:57] Fuseki INFO [6] Query = CONSTRUCT WHERE { ?s ?p ?o }
[2020-01-23 16:39:57] Fuseki INFO [6] 200 OK (95 ms)
[2020-01-23 16:40:06] Fuseki INFO [7] POST http://localhost:3030/ds/sparql
[2020-01-23 16:40:06] Fuseki INFO [7] Query = CONSTRUCT WHERE { ?s ?p ?o }
[2020-01-23 16:45:46] Fuseki INFO [7] 200 OK (339,646 s)
{noformat}
The first request was for Turtle, the second for RDF/XML. The latter took 5m
40s, so pretty much the same as Andy's riot command.
I think there's potentially 3 things to fix/improve here:
# The RDF/XML serializer shouldn't be this slow (over 5 minutes for 2.2k
triples is pretty excessive), but like Andy said, it can be difficult to fix.
I'm certainly not volunteering... I can open another Jira issue about this if
that's helpful.
# Fuseki shouldn't rely on a serializer that can occasionally be this slow. So
switching to the plain RDF/XML output in Fuseki would make sense to me. I can
investigate and try to put together a PR, if it helps.
# I think it's wrong that the Fuseki query timeout doesn't apply to the
serialization phase. I understand that usually the result serialization is a
tiny slice of the time spent handling the request, but in this case, it
actually takes much longer than the query engine part. If the timeout had
worked, my server (which has a 30 second timeout) would have been much better
off despite spending a lot of CPU on serialization, but since it didn't, it
spent an extra year or so of CPU time before I finally discovered the cause (my
fault really, since I wasn't monitoring it properly). I can also open another
issue about this if that makes sense.
> Fuseki RDF/XML response never finishes
> --------------------------------------
>
> Key: JENA-1826
> URL: https://issues.apache.org/jira/browse/JENA-1826
> Project: Apache Jena
> Issue Type: Bug
> Components: Fuseki
> Affects Versions: Jena 3.14.0
> Environment: Ubuntu 16.04
> java version "1.8.0_201"
> Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
> Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
> Reporter: Osma Suominen
> Priority: Major
> Attachments: W00067442800.ttl, data.nt
>
>
> I have a web app running SPARQL CONSTRUCT queries against Fuseki and
> generating web pages. I noticed that Fuseki started hogging all CPU cores a
> few hours after it was restarted. It turned out that some of the CONSTRUCT
> queries take a very long time to complete - at least 40 minutes but probably
> more and it seems quite likely they will never finish.
> I was able to turn this into a fairly minimal example. I've attached a 1.3MB
> Turtle file (~29k triples) with all the data necessary to demonstrate the
> problem.
> Start Fuseki like this: {{./fuseki-server --file W00067442800.ttl /ds}}
> Then open the Fuseki web UI and run this SPARQL query against the dataset:
> {noformat}
> PREFIX schema: <http://schema.org/>
> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
> CONSTRUCT {
> <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o .
> ?o schema:name ?oname ;
> skos:prefLabel ?olabel .
> ?inst ?instprop ?instval .
> ?instval schema:name ?instvalName ;
> skos:prefLabel ?instvalLabel .
> }
> WHERE {
> {
> <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o .
> OPTIONAL {
> {
> ?o schema:name ?oname
> } UNION {
> ?o skos:prefLabel ?olabel
> }
> }
> } UNION {
> {
> <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> schema:workExample ?inst
> } OPTIONAL {
> {
> ?inst ?instprop ?instval .
> OPTIONAL {
> {
> ?instval schema:name ?instvalName
> } UNION {
> ?instval skos:prefLabel ?instvalLabel
> }
> }
> }
> }
> }
> }
> {noformat}
> If you select Turtle as the content type, the query will finish in around 3
> seconds (plus rendering the result in the browser takes a while). If instead
> you select XML as the format, the query will just keep running, with Fuseki
> taking over a single CPU core completely. With several such queries running,
> all the CPU cores will eventually be used.
> This can also be demonstrated using curl (with the above query saved as
> {{query.rq}}):
> {noformat}
> curl -H 'Accept: text/turtle' --data-urlencode "[email protected]"
> http://localhost:3030/ds/sparql
> {noformat}
> works fine and gives you the Turtle output;
> {noformat}
> curl -H 'Accept: application/rdf+xml' --data-urlencode "[email protected]"
> http://localhost:3030/ds/sparql
> {noformat}
> never seems to finish.
> What's perhaps even worse, even a query timeout setting doesn't help. If I
> start Fuseki with a 10 second query timeout, i.e. {{--timeout 10000}}, it
> still won't stop the query from hogging the CPU forever. I'm guessing that
> the problem is in the final stages of the query processing, when the results
> just have to be serialized into the correct syntax, and the timeout is no
> longer applied in this stage.
> I discovered this problem while running Fuseki 3.5.0, but it happens with the
> most recent release 3.14.0 as well.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)