[
https://issues.apache.org/jira/browse/JENA-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rob Vesse updated JENA-1063:
----------------------------
Attachment: Jena1063.java
Attaching sample code that demonstrates the issue, code assumes a running
Fuseki service
You can reproduce with a pretty small dataset size (2000 triples plus) will
give some noticeable delay (~20 seconds) though something in the region of 4000
triples will give much bigger delay (1 minute plus) e.g.
{noformat}
execSelect() called at Friday, 6 November 2015 12:11:04 o'clock PST
close() called at Friday, 6 November 2015 12:11:06 o'clock PST
close() returned at Friday, 6 November 2015 12:12:17 o'clock PST
{noformat}
> QueryEngineHTTP.close() may hang for a long time
> ------------------------------------------------
>
> Key: JENA-1063
> URL: https://issues.apache.org/jira/browse/JENA-1063
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: Jena 3.0.0
> Reporter: Rob Vesse
> Attachments: Jena1063.java
>
>
> Stumbled on this behaviour by mistake, essentially if you issue a remote
> query and call {{close()}} on the {{QueryEngineHTTP}} prior to having
> consumed all the results then your application can hang until all the data is
> consumed from the response stream.
> This behaviour is caused by Apache HTTP Client which assumes it can re-use
> connections but in order to do so first needs to have consumed the previous
> response so it will sit in a tight loop until it has done this. Note that
> this won't always happen because HTTP Client will inspect various aspects of
> the response to decide whether it can re-use the connection. However unless
> certain conditions are met HTTP Client will default to the connection re-use
> behaviour.
> This is obviously bad for the user because if they've told us to close the
> execution then clearly they want us to dispose of it and carry out ASAP
> It also causes issues for the server because rather than dropping the
> connection HTTP Client continues to read from the server so the server may
> also be stuck in a semi-hung state doing a lot of work that the actual user
> is never going to see.
> Steps to reproduce:
> # Start up Fuseki
> # Run a simple Jena app that creates a query that will take a long time (e.g.
> a large cross product) and issue it to Fuseki, then call {{close()}} on the
> {{QueryExecution}}
> You should observe that the Jena app hangs until Fuseki reports the query as
> completed. If you log the current time before and after calling {{close()}}
> you should see a large delay (assuming a sufficiently long running query).
> There are several possible solutions that come to mind:
> # Upgrade to newer HTTP Client and hope it does not have the behaviour
> # Disable connection re-use when providing our own HTTP client
> # When we know we will shut down the client (and thus re-use is irrelevant)
> terminate the client rather than explicitly closing the connection
> 1 is likely to be problematic because APIs have changed significantly and
> there are dependency conflicts with other modules such as {{jean-text}}.
> Also I do not expect that newer versions will have changed their behaviour in
> this regard so it would be ineffective anyway
> 2 is intrusive but effective
> 3 may actually be the best option because this does not need us to explicitly
> configure a connection re-use strategy rather it allows us to simply kill off
> the client which we were potentially going to do anyway (unless the user
> customised the HTTP Client being used) which kills the connections without
> having to first consume the response and by killing the connection we should
> also abort the work on the server side because it should notice the dropped
> connection and stop trying to calculate and send further results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)