[ https://issues.apache.org/jira/browse/HTTPCORE-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Oleg Kalnichevski resolved HTTPCORE-609. ---------------------------------------- Fix Version/s: 4.4.13 Resolution: Fixed Merged to 4.4.x. Oleg > Null Pointer Exception because of race condition in DefaultConnectingIOReactor > ------------------------------------------------------------------------------ > > Key: HTTPCORE-609 > URL: https://issues.apache.org/jira/browse/HTTPCORE-609 > Project: HttpComponents HttpCore > Issue Type: Bug > Components: HttpCore NIO > Affects Versions: 4.4.12 > Reporter: Anurag Agarwal > Priority: Major > Fix For: 4.4.13 > > Time Spent: 40m > Remaining Estimate: 0h > > There is a race condition because of which a null pointer exception happens > rendering the IOReactor break. > > {code:java} > [ERROR] [] 2019-10-16 11:27:31.904 [pool-2-thread-1] InternalHttpAsyncClient > - I/O reactor terminated abnormally[ERROR] [] 2019-10-16 11:27:31.904 > [pool-2-thread-1] InternalHttpAsyncClient - I/O reactor terminated > abnormallyjava.lang.NullPointerException: null at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:170) > ~[AdExchange%23%23stable_1.32.23.44.jar:?] at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148) > ~[AdExchange%23%23stable_1.32.23.44.jar:?] at > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) > ~[AdExchange%23%23stable_1.32.23.44.jar:?] at > org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) > ~[AdExchange%23%23stable_1.32.23.44.jar:?] at > org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) > ~[AdExchange%23%23stable_1.32.23.44.jar:?] at > java.lang.Thread.run(Thread.java:748) [?:1.8.0_141] > {code} > > > This is a tough race condition to reproduce, I did it by using debuggers. > This was introducted by > [https://github.com/apache/httpcomponents-core/commit/aa812282f26fdd1975233a892c5405fa0da781b4#diff-d577e717cb1e97f3a3c0adbc8d563062] > > I will try my best to explain the steps of what happens in this scenario: > # A session request is submitted to establish a connection to the > ConnectingIOReactor. > # > [https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L258-L260] > Here it is checked if the request is already completed or not. If not then a > socket is assigned and there after the request takes place. > # The race condition happens when after the second step and between the > assignment of the selection key > [https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L316-L318] > the request is cancelled. > # In this case from > [https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/SessionRequestImpl.java#L208-L227] > since the key has not yet being assigned, there is nothing to cancel, the > request is cancelled successuly and the interested parties are informed as > well via callbacks. > # When > [https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L316-L318] > executes now, we have got a condition where we have a cancelled request with > an active selection key. > # After the completion of processSessionRequests method, let us say we have > also reached the expiry time of the request and in the same call we hit > processTimeouts, where the current request is timedout. Since the request is > already in completed state (via cancel) it won't be timedout but the > attachement of the key will get an assignement null via the finally block. > # At this point, the request is still completed while the key is still not > cancelled. The key now has the attachment null. When the key becomes ready, > it will be passed to processEvent method where > [https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L169-L170] > will produce null pointer exception since the attachment is null. > This race condition might seem like a rare scenario but under heavy load, > this is not that difficult to reach. In those scenarios, the IOReactor breaks > and no further request can be made via the client. > > I tried adding a test case for the same, but couldn't. I will let you know > the params that I am using and the point where I had put the debug point if > that helps. > > soTimeout -> 1000 > connectTimeout -> 1000 > soKeepAlive -> true > tcpNoDelay -> true > selectInterval -> 10 > soThreadCount -> 4 > soReuseAddress -> true > > DefaultRequestConfig > socketTimeout -> 1000 > connectTimeout -> 1000 > > Debug Point: > [https://github.com/apache/httpcomponents-core/blob/e7b4f5cc306b3eb4b6e1e24627971049001ddd68/httpcore-nio/src/main/java/org/apache/http/impl/nio/reactor/DefaultConnectingIOReactor.java#L263] > > At this point I will call request.cancel and wait for 1-2 seconds before I > move forward, and it reproduces the above scenario. > > A suggested solution is to check while assigning key to request if it is > already completed then cancel the key and close the channel. We have been > using this in production, and it resolves the above issue. I will also submit > a corresponding PR for the same. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org