[ 
https://issues.apache.org/jira/browse/JENA-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994419#comment-14994419
 ] 

ASF subversion and git services commented on JENA-1063:
-------------------------------------------------------

Commit e059b78a0dbc0851ead9411767053f077c0fff23 in jena's branch 
refs/heads/master from [~rvesse]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=e059b78 ]

Fix for JENA-1063

This commit changes the behaviour of QueryEngineHTTP to avoid an
undesireable behaviour from Apache HTTP Client which we use for our HTTP
communications.

HTTP Client tries to re-use connections by default
which means that it must finish consuming responses before it can close
the InputStream associated with a specific HTTP response.  However this
can cause a hang and leave both the client and remote server stuck doing
unecessary work since most of the time we are going to clean up the HTTP
Client anyway so leaving the connections available for re-use makes no
sense for us.

The change is essentially to check whether we are going to clean up the
HTTP Client and if so do that first so that when we clean up the
associated InputStream the underlying connection is already closed and
it can't and won't waste time trying to consume the remaining response.


> QueryEngineHTTP.close() may hang for a long time
> ------------------------------------------------
>
>                 Key: JENA-1063
>                 URL: https://issues.apache.org/jira/browse/JENA-1063
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.0.0
>            Reporter: Rob Vesse
>         Attachments: Jena1063.java
>
>
> Stumbled on this behaviour by mistake, essentially if you issue a remote 
> query and call {{close()}} on the {{QueryEngineHTTP}} prior to having 
> consumed all the results then your application can hang until all the data is 
> consumed from the response stream.
> This behaviour is caused by Apache HTTP Client which assumes it can re-use 
> connections but in order to do so first needs to have consumed the previous 
> response so it will sit in a tight loop until it has done this.  Note that 
> this won't always happen because HTTP Client will inspect various aspects of 
> the response to decide whether it can re-use the connection.  However unless 
> certain conditions are met HTTP Client will default to the connection re-use 
> behaviour.
> This is obviously bad for the user because if they've told us to close the 
> execution then clearly they want us to dispose of it and carry out ASAP
> It also causes issues for the server because rather than dropping the 
> connection HTTP Client continues to read from the server so the server may 
> also be stuck in a semi-hung state doing a lot of work that the actual user 
> is never going to see.
> Steps to reproduce:
> # Start up Fuseki
> # Run a simple Jena app that creates a query that will take a long time (e.g. 
> a large cross product) and issue it to Fuseki, then call {{close()}} on the 
> {{QueryExecution}}
> You should observe that the Jena app hangs until Fuseki reports the query as 
> completed.  If you log the current time before and after calling {{close()}} 
> you should see a large delay (assuming a sufficiently long running query).
> There are several possible solutions that come to mind:
> # Upgrade to newer HTTP Client and hope it does not have the behaviour
> # Disable connection re-use when providing our own HTTP client
> # When we know we will shut down the client (and thus re-use is irrelevant) 
> terminate the client rather than explicitly closing the connection
> 1 is likely to be problematic because APIs have changed significantly and 
> there are dependency conflicts with other modules such as {{jean-text}}.  
> Also I do not expect that newer versions will have changed their behaviour in 
> this regard so it would be ineffective anyway
> 2 is intrusive but effective
> 3 may actually be the best option because this does not need us to explicitly 
> configure a connection re-use strategy rather it allows us to simply kill off 
> the client which we were potentially going to do anyway (unless the user 
> customised the HTTP Client being used) which kills the connections without 
> having to first consume the response and by killing the connection we should 
> also abort the work on the server side because it should notice the dropped 
> connection and stop trying to calculate and send further results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to