On 28/11/2018 15:30, Rémy Maucherat wrote:
> On Wed, Nov 28, 2018 at 4:20 PM Mark Thomas <ma...@apache.org> wrote:
> 
>> On 28/11/2018 15:00, ma...@apache.org wrote:
>>> Author: markt
>>> Date: Wed Nov 28 15:00:06 2018
>>> New Revision: 1847646
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1847646&view=rev
>>> Log:
>>> Fix possible cause of intermittent TestCoyoteOutputStream failures.
>>
>> I thought this would be worthy of a longer explanation than seemed
>> appropriate for a commit message.
>>
> 
> I feel bad for not thinking about it since it does sound quite logical.

I wouldn't feel too bad. I've lost track of the number of logical
explanations I have had for problems that I have had to throw away once
I had more data. To be honest, I'm expecting this to carry on failing
intermittently and that the root cause - when we find it - will turn out
to be my refactoring.

> BTW, the testsuite failed but it wasn't *that*. What a coincidence !

Indeed. I saw that and started to mentally draft a "Ignore my last"
email while I scrolled to the end to see which test had failed.

Mark


> 
> Rémy
> 
> 
>>
>> I have tried to recreate the issue locally without success. I was able
>> to recreate it occasionally running the tests on silvanus.a.o (the CI
>> machine that runs all our buildbot jobs).
>>
>> I captured a network trace that confirmed that this was a server side
>> bug. What I saw was a corrupted response. The headers and first chunk
>> were correct but rather than the 5 bytes of the end chunk I saw the
>> other 8187 (8192-5) bytes of the buffer. It was clear the buffer was
>> configured for write when it was being read.
>>
>> I then tried to figure out how this could happen with a view to
>> reproducing the issue.
>>
>> There were a lot of dead ends during which I noticed that the write
>> pattern varied when I added additional debug statements. I discovered
>> that, depending on timing, the NIO2 endpoint would sometimes use a
>> gathering write when performing a non-blocking flush.
>>
>> There is a non-blocking flush just before the switch back to blocking
>> I/O (after the dispatch to end the async component) and it looked to be
>> possible that the gathering write could still be in progress when the
>> following blocking write was performed. That in turn meant that one of
>> the buffers used by the gathering write could be modified during the
>> following blocking write.
>>
>> However, my current understanding of the code is that the gathering
>> write will have written all the data from the buffer that is used by the
>> following blocking write before that blocking write occurs. So I may
>> have missed the root cause completely. It depends a lot on the internal
>> workings of the AsynchronousSocketChannel.
>>
>> On balance, I decided to commit this fix as there does appear to be a
>> bug here. Hopefully, it is the root cause of the intermittent
>> TestCoyoteOutputStream failures. If it is, great. If not, I'll keep
>> looking.
>>
>> Mark
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
>> For additional commands, e-mail: dev-h...@tomcat.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to