[
https://issues.apache.org/jira/browse/HTTPCORE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893348#comment-16893348
]
Linton Miller commented on HTTPCORE-584:
----------------------------------------
I was afraid that might happen. Being a threading bug, it's timing sensitive;
depends on CPU speed, network speed, etc. I tried to make it happen reasonably
reliably by making the number of connections large without being completely
ridiculous, seems that's not enough.
I've uploaded a second version of the test class that adds synchronization
(it's a bit more of a contrived example as a result, but doesn't invalidate the
test). Hopefully that makes the timings more likely to cause problems, but
there are still no guarantees. It can depend on how many request there are and
how fast they process. e.g. when I run from New Zealand, I find that I want to
allow about 100ms sleep before closing the idle connection, whereas running
from the US, you only want about 20ms (because the requests are served more
quickly). You can supply the number of threads and delay as command line params
to the second test class:
java LaxConnPoolThreadingTest2 <num-requests> <close-delay-milliseconds>
You can also compile the class with DEBUG = true, which allows you to see the
same PoolEntry objects being referenced in the closeIdle and lease threads
concurrently.
Regardless of whether you can get either of these test classes to exhibit the
failure, hopefully you can see the bug just by inspection of the code (which is
how I first noticed it):
Consider the simple case of just a single PerRoutePool with at least 1
available connection in it, and 2 threads. Thread 1 executes the closeIdle,
which iterates over all the available connections, and Thread 2 executes a
lease request for an available connection from the pool. Then the timing
sequence:
Thread 1:
closeIdle
enumAvailable
for (final Iterator<PoolEntry<T, C>> it = available.iterator();
it.hasNext(); ) {
final PoolEntry<T, C> entry = it.next();
callback.execute(entry);
if (entry.getUpdated() <= deadline) {
but before Thread 1 progresses any further and actually executes the next line
to discard the connection
entry.discardConnection(CloseMode.GRACEFUL);
Thread 2 starts executing:
lease
final PoolEntry<T, C> availableEntry = getAvailableEntry(state);
final PoolEntry<T, C> entry = available.poll();
At this point Thread 1 and Thread 2 now both have a reference to the same
PoolEntry, and that's a problem.
For example, if Thread 2 is a ConnectionManager, and execution continues on to
completes the lease call and returns the PoolEntry, from which the
ConnectionManager returns a connection to a client that sends out an HTTP GET
request. Then Thread 1 springs back into life and executes the
entry.discardConnection, and it's closed the connection that Thread 2 is
actively in the middle of using for its HTTP request.
Boiling that down to the fundamental problem, the enumAvailable method takes
references to the PoolEntries in the available pool without blocking those
entries from being used concurrently on a different Thread.
PoolEntries are not themselves thread-safe items. e.g. you obviously can't have
2 threads both trying to have a conversation on the same HttpClientConnection
at once. It's up to the connection pool to establish the _guarantee_ that a
PoolEntry can only ever be accessed by a single thread at once.
You can see the difference comparing to the StrictConnPool, which locks the
entire pool while it iterates over it, thus guarantee no PoolEntry will be
accessed by anything other that the iterator while it's active.
> Iterating a LaxConnPool is not thread-safe
> ------------------------------------------
>
> Key: HTTPCORE-584
> URL: https://issues.apache.org/jira/browse/HTTPCORE-584
> Project: HttpComponents HttpCore
> Issue Type: Bug
> Components: HttpCore
> Affects Versions: 5.0-beta8
> Reporter: Linton Miller
> Priority: Minor
> Attachments: LaxConnPoolThreadingTest.java,
> LaxConnPoolThreadingTest2.java
>
>
> Iterating over connections in a LaxConnPool can cause unexpected connection
> exceptions; for example, calling the closeIdle or closeExpired methods at the
> same time as threads are requesting connections from the pool.
> This is demonstrated by the attached test class, which creates a LaxConnPool
> for a PoolingHttpClientConnectionManager and then uses that to execute a
> number of concurrent requests, sleeps a bit, then re-executes concurrent
> requests at the same time as closing idle connections in the pool. Being a
> threading bug, it's not a guaranteed fail every time, but within a few runs,
> the test code will normally get at least one request thread that fails. e.g
> {noformat}
> Thread req-000000AB request failed!
> org.apache.hc.core5.http.ConnectionClosedException: Connection is closed
> at
> org.apache.hc.core5.http.impl.io.BHttpConnectionBase.ensureOpen(BHttpConnectionBase.java:98)
> at
> org.apache.hc.core5.http.impl.io.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:186)
> at
> org.apache.hc.core5.http.impl.io.HttpRequestExecutor.execute(HttpRequestExecutor.java:181)
> at
> org.apache.hc.core5.http.impl.io.HttpRequestExecutor.execute(HttpRequestExecutor.java:224)
> at
> org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager$InternalConnectionEndpoint.execute(PoolingHttpClientConnectionManager.java:596)
> at
> org.apache.hc.client5.http.impl.classic.InternalExecRuntime.execute(InternalExecRuntime.java:220)
> at
> org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:175)
> ...
> at
> org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:77)
> at
> org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:102)
> at
> LaxConnPoolThreadingTest$2.executeReq(LaxConnPoolThreadingTest.java:69)
> at
> LaxConnPoolThreadingTest$2.run(LaxConnPoolThreadingTest.java:58){noformat}
> or
>
> {noformat}
> Thread req-00000179 request failed!
> java.lang.IllegalStateException: Endpoint is not connected
> at org.apache.hc.core5.util.Asserts.check(Asserts.java:38)
> at
> org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager$InternalConnectionEndpoint.getValidatedPoolEntry(PoolingHttpClientConnectionManager.java:548)
> at
> org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager$InternalConnectionEndpoint.execute(PoolingHttpClientConnectionManager.java:592)
> ...{noformat}
> There may also be recoverable I/O exceptions logged:
> {noformat}
> 2019-07-24 09:17:27,488 INFO
> [org.apache.hc.client5.http.impl.classic.RetryExec] Recoverable I/O exception
> (java.net.SocketException) caught when processing request to
> {}->http://httpbin.org:80{noformat}
>
> This appear to be because the iteration methods of LaxConnPool are not
> thread-safe. Though their access to the internal queue structures are
> protected by the queues being ConcurrentLinkedDeque, the iteration provides
> the means for the same pool entry to be accessed on multiple threads at once
> because there is no locking or concurrency control on the individual pool
> entries as the callback is executed through enumAvailable or enumLeased.
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]