Magic Mark, > Am 17.06.2021 um 23:20 schrieb Mark Thomas <ma...@apache.org>: > > On 17/06/2021 09:26, Mark Thomas wrote: > >> I think I might have found one contributing factor to this bug. I need to >> run a series of tests to determine whether I am seeing random variation in >> test results or a genuine effect. > > It was random effects but I believe I have now found the bug. > > Consider two threads, T1 and T2 writing HTTP/2 response bodies concurrently > in the same HTTP/2 Connection. > > You'll need to have the code in front of you to follow what is going on > > The write: > > https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1364 > > and the associated completion handler > > https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1044 > > > The detail of the code is fairly complex but all you really need to keep in > mind is the following: > > - the writePending semaphore ensures only one thread can write at a time > > - the state of the write is maintained in a OperationState instance that is > stored in SocketWrapperBase.writeOperation (L1390) > > - the completion handler clears this state (L1050) and releases the > semaphore (L1046) > > > The sequence of events for a failure is as follows: > > - T1 obtains the write semaphore (L1366) > - T1 creates an OperationState and sets writeOperation (L1390) > - the async write for T1 completes and the completion handler is called > - T1's completion handler releases the semaphore (L1046) > - T2 obtains the write semaphore (L1366) > - T2 creates an OperationState and sets writeOperation (L1390) > - T1's completion handler clears writeOperation (L1050) > - the async write for T2 does not complete and the socket is added to > the Poller > - The Poller signals the socket is ready for write > - The Poller finds writeOperation is null so performs a normal dispatch > for write > - The async write times out as it never receives the notification from > the Poller > > The fix is to swap the order of clearing writeOperation and releasing the > semaphore. > > Concurrent reads will have the same problem and will be fixed by the same > solution. >
Thread handling and synchronizing at the max! Thanks for the insight and your hard work finding this! > Fix will be applied shortly. > > Mark > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org