[jira] [Commented] (JENA-1229) fuseki query timeouts not working correctly with DISTINCT

ASF GitHub Bot (JIRA) Fri, 02 Sep 2016 07:23:49 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458671#comment-15458671
 ]


ASF GitHub Bot commented on JENA-1229:
--------------------------------------

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/166
  
    In the list of commits above, the first commit says "ehedgehog committed on 
GitHub on 27 Jul"  then a merge from jena/master 2 days ago, then the JENA-1229 
change.  Just a guess: you created a branch, then merged jena/master, rather 
then getting your local master up to date then branching.  But it seems github 
works it out because if you go to the "166.diff" URL, then there is a short 
diff, no merge items.


> fuseki query timeouts not working correctly with DISTINCT
> ---------------------------------------------------------
>
>                 Key: JENA-1229
>                 URL: https://issues.apache.org/jira/browse/JENA-1229
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ, Fuseki
>    Affects Versions: FUSEKI 2.4.1-SNAPSHOT
>         Environment: Linux (Ubuntu or Fedora)
>            Reporter: christopher james dollin
>
> We have been having problems with timeouts on some of our
> production servers. On certain queries, timeouts of some
> tens of seconds do not appear to fire, no results are 
> generated by the query, and the query continues to 
> consume a CPU. If left long enough (more than an hour)
> the query finally terminates in a NullPointerExplosion.
> We have been able to replicate the issue using generated
> data and a slimmed-down query with four triples and
> DISTINCT.
> Investigation with the debugger led us to QueryIterDistinct's
> requestSubCancel method. Unlike other requestSubCancel
> declarations, its body is not empty and it specifically
> super.closes the current iterator. Further, following the
> chain of cancel-requestCancel methods from the 
> invocation of the timeout shows that the cancels stop
> here.
> We think that the cancellation protocol is satisfied
> by having QueryIterDistinct.requestSubCancel have
> an empty body, but we're not sure we've understood
> it completely. Certainly removing requestSubCancel's
> body removed the presenting problem, with the query
> timing out cleanly.
> A test dataset can be generated using this bash script:
> #!/bin/bash
> echo "PREFIX  :     <http://fake.epimorphics.com/timeouts/>"
> echo "PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>"
> let A=1
> let B=1
> let C=1
> for x in {10000..19999}; do
>     let A=A+1; if [ $A == 3 ]; then let A=1; fi
>     let B=B+1; if [ $B == 5 ]; then let B=1; fi
>     let C=C+1; if [ $C == 7 ]; then let C=1; fi
>     let INDEX=C+20
>     let MONTH=B+5
>     let DAY=28+A
>     export DATE=1995-0$MONTH-$DAY
>     export ID=id-$x
>     echo "<eh:item-$x>" ":transactionId" '"'$ID'"' .
>     echo "<eh:item-$x>" ":transactionDate" '"'$DATE'"^^xsd:date' .
>     echo "<eh:item-$x>" ":address" "<eh:address-$x>" .
>     echo "<eh:address-$x>" ":index" '"'$INDEX'"^^xsd:string' .
> done
> [This size of data is enough to ensure that, at least on the local
> testing machines, the (sub) iterations through all the possible results
> takes long enough to be obvious.]
> Run the  generator to send generated Turtle to stdout
> and pipe it to a .ttl file and then load that file into an
> emty TDB using tdbloader; run a Fuseki with the config below
> after tweaking the data directory name.
> @prefix :        <#> .
> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
> @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
> [] rdf:type fuseki:Server ;
>    # Timeout - server-wide default: milliseconds.
>    # Format 1: "1000" -- 1 second timeout
>    # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout 
> to for rest of query.
>    # See java doc for ARQ.queryTimeout
>         ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000,1000" 
> ] ;
>    # ja:loadClass "your.code.Class" ;
>    fuseki:services (
>      <#service_ds>
>    ) .
> # TDB
> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
> <#service_ds> rdf:type fuseki:Service ;
>     rdfs:label                             "TDB Service (RW)" ;
>     fuseki:name                            "ds" ;
>     fuseki:serviceQuery                    "query" ;
>     fuseki:serviceQuery                    "sparql" ;
>     fuseki:serviceUpdate                   "update" ;
>     fuseki:serviceUpload                   "upload" ;
>     fuseki:serviceReadWriteGraphStore      "data" ;
>     fuseki:dataset                         <#ds> ;
>     .
> <#ds> rdf:type      tdb:DatasetTDB ;
>         tdb:location   "/home/chris/faking/DATASET" ;
>         # tdb:unionDefaultGraph true ;
>      .
> Our test query is:
> PREFIX  :     <http://fake.epimorphics.com/timeouts/>
> PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
> SELECT DISTINCT  ?item WHERE   {
> ?address :index "22"^^xsd:string .
> ?item :transactionDate "1995-07-30"^^xsd:date .
> ?item :transactionId "id-100000" .
> ?item :address ?address .
> } OFFSET  0 LIMIT   1
> Running this query /should/ deliver one answer, item 100000,
> except it will (attempt to) time out first and fail to shut the query
> down cleanly until eventually it runs out of internal iterations.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JENA-1229) fuseki query timeouts not working correctly with DISTINCT

Reply via email to