Rob Vesse created JENA-1063:
-------------------------------
Summary: QueryEngineHTTP.close() may hang for a long time
Key: JENA-1063
URL: https://issues.apache.org/jira/browse/JENA-1063
Project: Apache Jena
Issue Type: Bug
Components: ARQ
Affects Versions: Jena 3.0.0
Reporter: Rob Vesse
Stumbled on this behaviour by mistake, essentially if you issue a remote query
and call {{close()}} on the {{QueryEngineHTTP}} prior to having consumed all
the results then your application can hang until all the data is consumed from
the response stream.
This behaviour is caused by Apache HTTP Client which assumes it can re-use
connections but in order to do so first needs to have consumed the previous
response so it will sit in a tight loop until it has done this. Note that this
won't always happen because HTTP Client will inspect various aspects of the
response to decide whether it can re-use the connection. However unless
certain conditions are met HTTP Client will default to the connection re-use
behaviour.
This is obviously bad for the user because if they've told us to close the
execution then clearly they want us to dispose of it and carry out ASAP
It also causes issues for the server because rather than dropping the
connection HTTP Client continues to read from the server so the server may also
be stuck in a semi-hung state doing a lot of work that the actual user is never
going to see.
Steps to reproduce:
# Start up Fuseki
# Run a simple Jena app that creates a query that will take a long time (e.g. a
large cross product) and issue it to Fuseki, then call {{close()}} on the
{{QueryExecution}}
You should observe that the Jena app hangs until Fuseki reports the query as
completed. If you log the current time before and after calling {{close()}}
you should see a large delay (assuming a sufficiently long running query).
There are several possible solutions that come to mind:
# Upgrade to newer HTTP Client and hope it does not have the behaviour
# Disable connection re-use when providing our own HTTP client
# When we know we will shut down the client (and thus re-use is irrelevant)
terminate the client rather than explicitly closing the connection
1 is likely to be problematic because APIs have changed significantly and there
are dependency conflicts with other modules such as {{jean-text}}. Also I do
not expect that newer versions will have changed their behaviour in this regard
so it would be ineffective anyway
2 is intrusive but effective
3 may actually be the best option because this does not need us to explicitly
configure a connection re-use strategy rather it allows us to simply kill off
the client which we were potentially going to do anyway (unless the user
customised the HTTP Client being used) which kills the connections without
having to first consume the response and by killing the connection we should
also abort the work on the server side because it should notice the dropped
connection and stop trying to calculate and send further results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)