Anurag Agarwal created HTTPCORE-609:
---------------------------------------
Summary: Null Pointer Exception because of race condition in
DefaultConnectingIOReactor
Key: HTTPCORE-609
URL: https://issues.apache.org/jira/browse/HTTPCORE-609
Project: HttpComponents HttpCore
Issue Type: Bug
Components: HttpCore NIO
Affects Versions: 4.4.12
Reporter: Anurag Agarwal
There is a race condition because of which a null pointer exception happens
rendering the IOReactor break.
{code:java}
[ERROR] [] 2019-10-16 11:27:31.904 [pool-2-thread-1] InternalHttpAsyncClient -
I/O reactor terminated abnormally[ERROR] [] 2019-10-16 11:27:31.904
[pool-2-thread-1] InternalHttpAsyncClient - I/O reactor terminated
abnormallyjava.lang.NullPointerException: null at
org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:170)
~[AdExchange%23%23stable_1.32.23.44.jar:?] at
org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148)
~[AdExchange%23%23stable_1.32.23.44.jar:?] at
org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351)
~[AdExchange%23%23stable_1.32.23.44.jar:?] at
org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
~[AdExchange%23%23stable_1.32.23.44.jar:?] at
org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
~[AdExchange%23%23stable_1.32.23.44.jar:?] at
java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
{code}
This is a tough race condition to reproduce, I did it by using debuggers. This
was introducted by
[https://github.com/apache/httpcomponents-core/commit/aa812282f26fdd1975233a892c5405fa0da781b4#diff-d577e717cb1e97f3a3c0adbc8d563062]
I will try my best to explain the steps of what happens in this scenario:
# A session request is submitted to establish a connection to the
ConnectingIOReactor.
#
[https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L258-L260]
Here it is checked if the request is already completed or not. If not then a
socket is assigned and there after the request takes place.
# The race condition happens when after the second step and between the
assignment of the selection key
[https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L316-L318]
the request is cancelled.
# In this case from
[https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/SessionRequestImpl.java#L208-L227]
since the key has not yet being assigned, there is nothing to cancel, the
request is cancelled successuly and the interested parties are informed as well
via callbacks.
# When
[https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L316-L318]
executes now, we have got a condition where we have a cancelled request with
an active selection key.
# After the completion of processSessionRequests method, let us say we have
also reached the expiry time of the request and in the same call we hit
processTimeouts, where the current request is timedout. Since the request is
already in completed state (via cancel) it won't be timedout but the
attachement of the key will get an assignement null via the finally block.
# At this point, the request is still completed while the key is still not
cancelled. The key now has the attachment null. When the key becomes ready, it
will be passed to processEvent method where
[https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L169-L170]
will produce null pointer exception since the attachment is null.
This race condition might seem like a rare scenario but under heavy load, this
is not that difficult to reach. In those scenarios, the IOReactor breaks and no
further request can be made via the client.
I tried adding a test case for the same, but couldn't. I will let you know the
params that I am using and the point where I had put the debug point if that
helps.
soTimeout -> 1000
connectTimeout -> 1000
soKeepAlive -> true
tcpNoDelay -> true
selectInterval -> 10
soThreadCount -> 4
soReuseAddress -> true
DefaultRequestConfig
socketTimeout -> 1000
connectTimeout -> 1000
Debug Point:
[https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L263]
At this point I will call request.cancel and wait for 1-2 seconds before I move
forward, and it reproduces the above scenario.
A suggested solution is to check while assigning key to request if it is
already completed then cancel the key and close the channel. We have been using
this in production, and it resolves the above issue. I will also submit a
corresponding PR for the same.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]