[
https://issues.apache.org/jira/browse/JENA-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458671#comment-15458671
]
ASF GitHub Bot commented on JENA-1229:
--------------------------------------
Github user afs commented on the issue:
https://github.com/apache/jena/pull/166
In the list of commits above, the first commit says "ehedgehog committed on
GitHub on 27 Jul" then a merge from jena/master 2 days ago, then the JENA-1229
change. Just a guess: you created a branch, then merged jena/master, rather
then getting your local master up to date then branching. But it seems github
works it out because if you go to the "166.diff" URL, then there is a short
diff, no merge items.
> fuseki query timeouts not working correctly with DISTINCT
> ---------------------------------------------------------
>
> Key: JENA-1229
> URL: https://issues.apache.org/jira/browse/JENA-1229
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ, Fuseki
> Affects Versions: FUSEKI 2.4.1-SNAPSHOT
> Environment: Linux (Ubuntu or Fedora)
> Reporter: christopher james dollin
>
> We have been having problems with timeouts on some of our
> production servers. On certain queries, timeouts of some
> tens of seconds do not appear to fire, no results are
> generated by the query, and the query continues to
> consume a CPU. If left long enough (more than an hour)
> the query finally terminates in a NullPointerExplosion.
> We have been able to replicate the issue using generated
> data and a slimmed-down query with four triples and
> DISTINCT.
> Investigation with the debugger led us to QueryIterDistinct's
> requestSubCancel method. Unlike other requestSubCancel
> declarations, its body is not empty and it specifically
> super.closes the current iterator. Further, following the
> chain of cancel-requestCancel methods from the
> invocation of the timeout shows that the cancels stop
> here.
> We think that the cancellation protocol is satisfied
> by having QueryIterDistinct.requestSubCancel have
> an empty body, but we're not sure we've understood
> it completely. Certainly removing requestSubCancel's
> body removed the presenting problem, with the query
> timing out cleanly.
> A test dataset can be generated using this bash script:
> #!/bin/bash
> echo "PREFIX : <http://fake.epimorphics.com/timeouts/>"
> echo "PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>"
> let A=1
> let B=1
> let C=1
> for x in {10000..19999}; do
> let A=A+1; if [ $A == 3 ]; then let A=1; fi
> let B=B+1; if [ $B == 5 ]; then let B=1; fi
> let C=C+1; if [ $C == 7 ]; then let C=1; fi
> let INDEX=C+20
> let MONTH=B+5
> let DAY=28+A
> export DATE=1995-0$MONTH-$DAY
> export ID=id-$x
> echo "<eh:item-$x>" ":transactionId" '"'$ID'"' .
> echo "<eh:item-$x>" ":transactionDate" '"'$DATE'"^^xsd:date' .
> echo "<eh:item-$x>" ":address" "<eh:address-$x>" .
> echo "<eh:address-$x>" ":index" '"'$INDEX'"^^xsd:string' .
> done
> [This size of data is enough to ensure that, at least on the local
> testing machines, the (sub) iterations through all the possible results
> takes long enough to be obvious.]
> Run the generator to send generated Turtle to stdout
> and pipe it to a .ttl file and then load that file into an
> emty TDB using tdbloader; run a Fuseki with the config below
> after tweaking the data directory name.
> @prefix : <#> .
> @prefix fuseki: <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
> [] rdf:type fuseki:Server ;
> # Timeout - server-wide default: milliseconds.
> # Format 1: "1000" -- 1 second timeout
> # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout
> to for rest of query.
> # See java doc for ARQ.queryTimeout
> ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "1000,1000"
> ] ;
> # ja:loadClass "your.code.Class" ;
> fuseki:services (
> <#service_ds>
> ) .
> # TDB
> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
> tdb:GraphTDB rdfs:subClassOf ja:Model .
> <#service_ds> rdf:type fuseki:Service ;
> rdfs:label "TDB Service (RW)" ;
> fuseki:name "ds" ;
> fuseki:serviceQuery "query" ;
> fuseki:serviceQuery "sparql" ;
> fuseki:serviceUpdate "update" ;
> fuseki:serviceUpload "upload" ;
> fuseki:serviceReadWriteGraphStore "data" ;
> fuseki:dataset <#ds> ;
> .
> <#ds> rdf:type tdb:DatasetTDB ;
> tdb:location "/home/chris/faking/DATASET" ;
> # tdb:unionDefaultGraph true ;
> .
> Our test query is:
> PREFIX : <http://fake.epimorphics.com/timeouts/>
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> SELECT DISTINCT ?item WHERE {
> ?address :index "22"^^xsd:string .
> ?item :transactionDate "1995-07-30"^^xsd:date .
> ?item :transactionId "id-100000" .
> ?item :address ?address .
> } OFFSET 0 LIMIT 1
> Running this query /should/ deliver one answer, item 100000,
> except it will (attempt to) time out first and fail to shut the query
> down cleanly until eventually it runs out of internal iterations.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)