Claus Stadler created JENA-1858:
-----------------------------------

             Summary: SERVICE in SPARQL blocks after a while
                 Key: JENA-1858
                 URL: https://issues.apache.org/jira/browse/JENA-1858
             Project: Apache Jena
          Issue Type: Bug
          Components: ARQ
    Affects Versions: Jena 3.14.0
            Reporter: Claus Stadler


Hi once again :)
I wanted to create a quick RDF/SPARQL-based service online/offline monitoring 
system just like this:

* A list of endpoints in [this 
dataset|https://github.com/SmartDataAnalytics/Meta-LOD/blob/master/sparql-endpoints.ttl]
* Have a CI process run this SPARQL query and publish/commit the results to a 
file

{code}
PREFIX eg: <http://www.example.org/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>

CONSTRUCT {
    ?s eg:serviceStatus ?status
}
{
  ?s dcat:endpointURL ?e .

  # Here we rely on jena's substitution mechanism in QueryIterService.java - 
which is sufficient for my use case
  SERVICE SILENT ?e { 
    # If the request fails, we get a single binding without any variables bound
    { SELECT ?t { ?x a ?t } LIMIT 1 }
  }

  BIND(IF(BOUND(?t), "online", "offline") AS ?status)
}
{code}

However, the query blocks after a while by consuming the HTTP connection pool.
I have not yet identified all sources, but one I could spot is here:

* The InputStream opened at 
[Service.java#L172|https://github.com/apache/jena/blob/64253b9de5924006cdd46f1e3492a92031842d3b/jena-arq/src/main/java/org/apache/jena/sparql/engine/http/Service.java#L172]
 is not in a try-catch-block, so if the subsequent XML parsing fails, then it 
is never closed.

Maybe this triggers ideas of potentially other spots. I have a local jena 
checkout and will try to find out whether there are any other leaks. My goal is 
to have the query complete on the whole endpoint list - despite many of the 
URLs actually referring to by now broken services.


I am aware of the context settings in 
https://jena.apache.org/documentation/query/service.html - but I did not fiddle 
with the settings - especially timeouts, as so far the issue is really the 
exhaustion of the connection pool.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to