Magic Mark,

> Am 17.06.2021 um 23:20 schrieb Mark Thomas <ma...@apache.org>:
> 
> On 17/06/2021 09:26, Mark Thomas wrote:
> 
>> I think I might have found one contributing factor to this bug. I need to 
>> run a series of tests to determine whether I am seeing random variation in 
>> test results or a genuine effect.
> 
> It was random effects but I believe I have now found the bug.
> 
> Consider two threads, T1 and T2 writing HTTP/2 response bodies concurrently 
> in the same HTTP/2 Connection.
> 
> You'll need to have the code in front of you to follow what is going on
> 
> The write:
> 
> https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1364
> 
> and the associated completion handler
> 
> https://github.com/apache/tomcat/blob/main/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1044
> 
> 
> The detail of the code is fairly complex but all you really need to keep in 
> mind is the following:
> 
> - the writePending semaphore ensures only one thread can write at a time
> 
> - the state of the write is maintained in a OperationState instance that is 
> stored in SocketWrapperBase.writeOperation (L1390)
> 
> - the completion handler clears this state (L1050) and releases the
>  semaphore (L1046)
> 
> 
> The sequence of events for a failure is as follows:
> 
> - T1 obtains the write semaphore (L1366)
> - T1 creates an OperationState and sets writeOperation (L1390)
> - the async write for T1 completes and the completion handler is called
> - T1's completion handler releases the semaphore (L1046)
> - T2 obtains the write semaphore (L1366)
> - T2 creates an OperationState and sets writeOperation (L1390)
> - T1's completion handler clears writeOperation (L1050)
> - the async write for T2 does not complete and the socket is added to
>  the Poller
> - The Poller signals the socket is ready for write
> - The Poller finds writeOperation is null so performs a normal dispatch
>  for write
> - The async write times out as it never receives the notification from
>  the Poller
> 
> The fix is to swap the order of clearing writeOperation and releasing the 
> semaphore.
> 
> Concurrent reads will have the same problem and will be fixed by the same 
> solution.
> 

Thread handling and synchronizing at the max!

Thanks for the insight and your hard work finding this!


> Fix will be applied shortly.
> 
> Mark
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to