[ 
https://issues.apache.org/jira/browse/JENA-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse updated JENA-1063:
----------------------------
    Attachment: Jena1063.java

Attaching sample code that demonstrates the issue, code assumes a running 
Fuseki service

You can reproduce with a pretty small dataset size (2000 triples plus) will 
give some noticeable delay (~20 seconds) though something in the region of 4000 
triples will give much bigger delay (1 minute plus) e.g.

{noformat}
execSelect() called at Friday, 6 November 2015 12:11:04 o'clock PST
close() called at Friday, 6 November 2015 12:11:06 o'clock PST
close() returned at Friday, 6 November 2015 12:12:17 o'clock PST
{noformat}

> QueryEngineHTTP.close() may hang for a long time
> ------------------------------------------------
>
>                 Key: JENA-1063
>                 URL: https://issues.apache.org/jira/browse/JENA-1063
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.0.0
>            Reporter: Rob Vesse
>         Attachments: Jena1063.java
>
>
> Stumbled on this behaviour by mistake, essentially if you issue a remote 
> query and call {{close()}} on the {{QueryEngineHTTP}} prior to having 
> consumed all the results then your application can hang until all the data is 
> consumed from the response stream.
> This behaviour is caused by Apache HTTP Client which assumes it can re-use 
> connections but in order to do so first needs to have consumed the previous 
> response so it will sit in a tight loop until it has done this.  Note that 
> this won't always happen because HTTP Client will inspect various aspects of 
> the response to decide whether it can re-use the connection.  However unless 
> certain conditions are met HTTP Client will default to the connection re-use 
> behaviour.
> This is obviously bad for the user because if they've told us to close the 
> execution then clearly they want us to dispose of it and carry out ASAP
> It also causes issues for the server because rather than dropping the 
> connection HTTP Client continues to read from the server so the server may 
> also be stuck in a semi-hung state doing a lot of work that the actual user 
> is never going to see.
> Steps to reproduce:
> # Start up Fuseki
> # Run a simple Jena app that creates a query that will take a long time (e.g. 
> a large cross product) and issue it to Fuseki, then call {{close()}} on the 
> {{QueryExecution}}
> You should observe that the Jena app hangs until Fuseki reports the query as 
> completed.  If you log the current time before and after calling {{close()}} 
> you should see a large delay (assuming a sufficiently long running query).
> There are several possible solutions that come to mind:
> # Upgrade to newer HTTP Client and hope it does not have the behaviour
> # Disable connection re-use when providing our own HTTP client
> # When we know we will shut down the client (and thus re-use is irrelevant) 
> terminate the client rather than explicitly closing the connection
> 1 is likely to be problematic because APIs have changed significantly and 
> there are dependency conflicts with other modules such as {{jean-text}}.  
> Also I do not expect that newer versions will have changed their behaviour in 
> this regard so it would be ineffective anyway
> 2 is intrusive but effective
> 3 may actually be the best option because this does not need us to explicitly 
> configure a connection re-use strategy rather it allows us to simply kill off 
> the client which we were potentially going to do anyway (unless the user 
> customised the HTTP Client being used) which kills the connections without 
> having to first consume the response and by killing the connection we should 
> also abort the work on the server side because it should notice the dropped 
> connection and stop trying to calculate and send further results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to