[ 
https://issues.apache.org/jira/browse/JENA-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-1229:
--------------------------------
    Description: 
We have been having problems with timeouts on some of our
production servers. On certain queries, timeouts of some
tens of seconds do not appear to fire, no results are 
generated by the query, and the query continues to 
consume a CPU. If left long enough (more than an hour)
the query finally terminates in a NullPointerExplosion.

We have been able to replicate the issue using generated
data and a slimmed-down query with four triples and
DISTINCT.

Investigation with the debugger led us to QueryIterDistinct's
requestSubCancel method. Unlike other requestSubCancel
declarations, its body is not empty and it specifically
super.closes the current iterator. Further, following the
chain of cancel-requestCancel methods from the 
invocation of the timeout shows that the cancels stop
here.

We think that the cancellation protocol is satisfied
by having QueryIterDistinct.requestSubCancel have
an empty body, but we're not sure we've understood
it completely. Certainly removing requestSubCancel's
body removed the presenting problem, with the query
timing out cleanly.

A test dataset can be generated using this bash script:
{noformat}
#!/bin/bash

echo "PREFIX  :     <http://fake.epimorphics.com/timeouts/>"
echo "PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>"

let A=1
let B=1
let C=1

for x in {10000..19999}; do
    let A=A+1; if [ $A == 3 ]; then let A=1; fi
    let B=B+1; if [ $B == 5 ]; then let B=1; fi
    let C=C+1; if [ $C == 7 ]; then let C=1; fi

    let INDEX=C+20
    let MONTH=B+5
    let DAY=28+A

    export DATE=1995-0$MONTH-$DAY
    export ID=id-$x

    echo "<eh:item-$x>" ":transactionId" '"'$ID'"' .
    echo "<eh:item-$x>" ":transactionDate" '"'$DATE'"^^xsd:date' .
    echo "<eh:item-$x>" ":address" "<eh:address-$x>" .

    echo "<eh:address-$x>" ":index" '"'$INDEX'"^^xsd:string' .
done
{noformat}
[This size of data is enough to ensure that, at least on the local
testing machines, the (sub) iterations through all the possible results
takes long enough to be obvious.]

Run the  generator to send generated Turtle to stdout
and pipe it to a .ttl file and then load that file into an
emty TDB using tdbloader; run a Fuseki with the config below
after tweaking the data directory name.
{noformat}
@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] rdf:type fuseki:Server ;
   # Timeout - server-wide default: milliseconds.
   # Format 1: "1000" -- 1 second timeout
   # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout 
to for rest of query.
   # See java doc for ARQ.queryTimeout
        ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000,1000" ] 
;
   # ja:loadClass "your.code.Class" ;

   fuseki:services (
     <#service_ds>
   ) .

# TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

<#service_ds> rdf:type fuseki:Service ;
    rdfs:label                             "TDB Service (RW)" ;
    fuseki:name                            "ds" ;
    fuseki:serviceQuery                    "query" ;
    fuseki:serviceQuery                    "sparql" ;
    fuseki:serviceUpdate                   "update" ;
    fuseki:serviceUpload                   "upload" ;
    fuseki:serviceReadWriteGraphStore      "data" ;
    fuseki:dataset                         <#ds> ;
    .

<#ds> rdf:type      tdb:DatasetTDB ;
        tdb:location   "/home/chris/faking/DATASET" ;
        # tdb:unionDefaultGraph true ;
     .
{noformat}




Our test query is:
{noformat}
PREFIX  :     <http://fake.epimorphics.com/timeouts/>
PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT  ?item WHERE   {

?address :index "22"^^xsd:string .

?item :transactionDate "1995-07-30"^^xsd:date .
?item :transactionId "id-100000" .

?item :address ?address .

} OFFSET  0 LIMIT   1
{noformat}
Running this query /should/ deliver one answer, item 100000,
except it will (attempt to) time out first and fail to shut the query
down cleanly until eventually it runs out of internal iterations.
 














  was:
We have been having problems with timeouts on some of our
production servers. On certain queries, timeouts of some
tens of seconds do not appear to fire, no results are 
generated by the query, and the query continues to 
consume a CPU. If left long enough (more than an hour)
the query finally terminates in a NullPointerExplosion.

We have been able to replicate the issue using generated
data and a slimmed-down query with four triples and
DISTINCT.

Investigation with the debugger led us to QueryIterDistinct's
requestSubCancel method. Unlike other requestSubCancel
declarations, its body is not empty and it specifically
super.closes the current iterator. Further, following the
chain of cancel-requestCancel methods from the 
invocation of the timeout shows that the cancels stop
here.

We think that the cancellation protocol is satisfied
by having QueryIterDistinct.requestSubCancel have
an empty body, but we're not sure we've understood
it completely. Certainly removing requestSubCancel's
body removed the presenting problem, with the query
timing out cleanly.

A test dataset can be generated using this bash script:

#!/bin/bash

echo "PREFIX  :     <http://fake.epimorphics.com/timeouts/>"
echo "PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>"

let A=1
let B=1
let C=1

for x in {10000..19999}; do
    let A=A+1; if [ $A == 3 ]; then let A=1; fi
    let B=B+1; if [ $B == 5 ]; then let B=1; fi
    let C=C+1; if [ $C == 7 ]; then let C=1; fi

    let INDEX=C+20
    let MONTH=B+5
    let DAY=28+A

    export DATE=1995-0$MONTH-$DAY
    export ID=id-$x

    echo "<eh:item-$x>" ":transactionId" '"'$ID'"' .
    echo "<eh:item-$x>" ":transactionDate" '"'$DATE'"^^xsd:date' .
    echo "<eh:item-$x>" ":address" "<eh:address-$x>" .

    echo "<eh:address-$x>" ":index" '"'$INDEX'"^^xsd:string' .
done

[This size of data is enough to ensure that, at least on the local
testing machines, the (sub) iterations through all the possible results
takes long enough to be obvious.]

Run the  generator to send generated Turtle to stdout
and pipe it to a .ttl file and then load that file into an
emty TDB using tdbloader; run a Fuseki with the config below
after tweaking the data directory name.

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] rdf:type fuseki:Server ;
   # Timeout - server-wide default: milliseconds.
   # Format 1: "1000" -- 1 second timeout
   # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout 
to for rest of query.
   # See java doc for ARQ.queryTimeout
        ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000,1000" ] 
;
   # ja:loadClass "your.code.Class" ;

   fuseki:services (
     <#service_ds>
   ) .

# TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

<#service_ds> rdf:type fuseki:Service ;
    rdfs:label                             "TDB Service (RW)" ;
    fuseki:name                            "ds" ;
    fuseki:serviceQuery                    "query" ;
    fuseki:serviceQuery                    "sparql" ;
    fuseki:serviceUpdate                   "update" ;
    fuseki:serviceUpload                   "upload" ;
    fuseki:serviceReadWriteGraphStore      "data" ;
    fuseki:dataset                         <#ds> ;
    .

<#ds> rdf:type      tdb:DatasetTDB ;
        tdb:location   "/home/chris/faking/DATASET" ;
        # tdb:unionDefaultGraph true ;
     .





Our test query is:

PREFIX  :     <http://fake.epimorphics.com/timeouts/>
PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT  ?item WHERE   {

?address :index "22"^^xsd:string .

?item :transactionDate "1995-07-30"^^xsd:date .
?item :transactionId "id-100000" .

?item :address ?address .

} OFFSET  0 LIMIT   1

Running this query /should/ deliver one answer, item 100000,
except it will (attempt to) time out first and fail to shut the query
down cleanly until eventually it runs out of internal iterations.
 















> fuseki query timeouts not working correctly with DISTINCT
> ---------------------------------------------------------
>
>                 Key: JENA-1229
>                 URL: https://issues.apache.org/jira/browse/JENA-1229
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ, Fuseki
>    Affects Versions: FUSEKI 2.4.1-SNAPSHOT
>         Environment: Linux (Ubuntu or Fedora)
>            Reporter: christopher james dollin
>
> We have been having problems with timeouts on some of our
> production servers. On certain queries, timeouts of some
> tens of seconds do not appear to fire, no results are 
> generated by the query, and the query continues to 
> consume a CPU. If left long enough (more than an hour)
> the query finally terminates in a NullPointerExplosion.
> We have been able to replicate the issue using generated
> data and a slimmed-down query with four triples and
> DISTINCT.
> Investigation with the debugger led us to QueryIterDistinct's
> requestSubCancel method. Unlike other requestSubCancel
> declarations, its body is not empty and it specifically
> super.closes the current iterator. Further, following the
> chain of cancel-requestCancel methods from the 
> invocation of the timeout shows that the cancels stop
> here.
> We think that the cancellation protocol is satisfied
> by having QueryIterDistinct.requestSubCancel have
> an empty body, but we're not sure we've understood
> it completely. Certainly removing requestSubCancel's
> body removed the presenting problem, with the query
> timing out cleanly.
> A test dataset can be generated using this bash script:
> {noformat}
> #!/bin/bash
> echo "PREFIX  :     <http://fake.epimorphics.com/timeouts/>"
> echo "PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>"
> let A=1
> let B=1
> let C=1
> for x in {10000..19999}; do
>     let A=A+1; if [ $A == 3 ]; then let A=1; fi
>     let B=B+1; if [ $B == 5 ]; then let B=1; fi
>     let C=C+1; if [ $C == 7 ]; then let C=1; fi
>     let INDEX=C+20
>     let MONTH=B+5
>     let DAY=28+A
>     export DATE=1995-0$MONTH-$DAY
>     export ID=id-$x
>     echo "<eh:item-$x>" ":transactionId" '"'$ID'"' .
>     echo "<eh:item-$x>" ":transactionDate" '"'$DATE'"^^xsd:date' .
>     echo "<eh:item-$x>" ":address" "<eh:address-$x>" .
>     echo "<eh:address-$x>" ":index" '"'$INDEX'"^^xsd:string' .
> done
> {noformat}
> [This size of data is enough to ensure that, at least on the local
> testing machines, the (sub) iterations through all the possible results
> takes long enough to be obvious.]
> Run the  generator to send generated Turtle to stdout
> and pipe it to a .ttl file and then load that file into an
> emty TDB using tdbloader; run a Fuseki with the config below
> after tweaking the data directory name.
> {noformat}
> @prefix :        <#> .
> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
> @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
> [] rdf:type fuseki:Server ;
>    # Timeout - server-wide default: milliseconds.
>    # Format 1: "1000" -- 1 second timeout
>    # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout 
> to for rest of query.
>    # See java doc for ARQ.queryTimeout
>         ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000,1000" 
> ] ;
>    # ja:loadClass "your.code.Class" ;
>    fuseki:services (
>      <#service_ds>
>    ) .
> # TDB
> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
> <#service_ds> rdf:type fuseki:Service ;
>     rdfs:label                             "TDB Service (RW)" ;
>     fuseki:name                            "ds" ;
>     fuseki:serviceQuery                    "query" ;
>     fuseki:serviceQuery                    "sparql" ;
>     fuseki:serviceUpdate                   "update" ;
>     fuseki:serviceUpload                   "upload" ;
>     fuseki:serviceReadWriteGraphStore      "data" ;
>     fuseki:dataset                         <#ds> ;
>     .
> <#ds> rdf:type      tdb:DatasetTDB ;
>         tdb:location   "/home/chris/faking/DATASET" ;
>         # tdb:unionDefaultGraph true ;
>      .
> {noformat}
> Our test query is:
> {noformat}
> PREFIX  :     <http://fake.epimorphics.com/timeouts/>
> PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
> SELECT DISTINCT  ?item WHERE   {
> ?address :index "22"^^xsd:string .
> ?item :transactionDate "1995-07-30"^^xsd:date .
> ?item :transactionId "id-100000" .
> ?item :address ?address .
> } OFFSET  0 LIMIT   1
> {noformat}
> Running this query /should/ deliver one answer, item 100000,
> except it will (attempt to) time out first and fail to shut the query
> down cleanly until eventually it runs out of internal iterations.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to