Rob Vesse created JENA-1063:
-------------------------------

             Summary: QueryEngineHTTP.close() may hang for a long time
                 Key: JENA-1063
                 URL: https://issues.apache.org/jira/browse/JENA-1063
             Project: Apache Jena
          Issue Type: Bug
          Components: ARQ
    Affects Versions: Jena 3.0.0
            Reporter: Rob Vesse


Stumbled on this behaviour by mistake, essentially if you issue a remote query 
and call {{close()}} on the {{QueryEngineHTTP}} prior to having consumed all 
the results then your application can hang until all the data is consumed from 
the response stream.

This behaviour is caused by Apache HTTP Client which assumes it can re-use 
connections but in order to do so first needs to have consumed the previous 
response so it will sit in a tight loop until it has done this.  Note that this 
won't always happen because HTTP Client will inspect various aspects of the 
response to decide whether it can re-use the connection.  However unless 
certain conditions are met HTTP Client will default to the connection re-use 
behaviour.

This is obviously bad for the user because if they've told us to close the 
execution then clearly they want us to dispose of it and carry out ASAP

It also causes issues for the server because rather than dropping the 
connection HTTP Client continues to read from the server so the server may also 
be stuck in a semi-hung state doing a lot of work that the actual user is never 
going to see.

Steps to reproduce:

# Start up Fuseki
# Run a simple Jena app that creates a query that will take a long time (e.g. a 
large cross product) and issue it to Fuseki, then call {{close()}} on the 
{{QueryExecution}}

You should observe that the Jena app hangs until Fuseki reports the query as 
completed.  If you log the current time before and after calling {{close()}} 
you should see a large delay (assuming a sufficiently long running query).

There are several possible solutions that come to mind:

# Upgrade to newer HTTP Client and hope it does not have the behaviour
# Disable connection re-use when providing our own HTTP client
# When we know we will shut down the client (and thus re-use is irrelevant) 
terminate the client rather than explicitly closing the connection

1 is likely to be problematic because APIs have changed significantly and there 
are dependency conflicts with other modules such as {{jean-text}}.  Also I do 
not expect that newer versions will have changed their behaviour in this regard 
so it would be ineffective anyway

2 is intrusive but effective

3 may actually be the best option because this does not need us to explicitly 
configure a connection re-use strategy rather it allows us to simply kill off 
the client which we were potentially going to do anyway (unless the user 
customised the HTTP Client being used) which kills the connections without 
having to first consume the response and by killing the connection we should 
also abort the work on the server side because it should notice the dropped 
connection and stop trying to calculate and send further results.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to