On 28/11/2018 15:30, Rémy Maucherat wrote: > On Wed, Nov 28, 2018 at 4:20 PM Mark Thomas <ma...@apache.org> wrote: > >> On 28/11/2018 15:00, ma...@apache.org wrote: >>> Author: markt >>> Date: Wed Nov 28 15:00:06 2018 >>> New Revision: 1847646 >>> >>> URL: http://svn.apache.org/viewvc?rev=1847646&view=rev >>> Log: >>> Fix possible cause of intermittent TestCoyoteOutputStream failures. >> >> I thought this would be worthy of a longer explanation than seemed >> appropriate for a commit message. >> > > I feel bad for not thinking about it since it does sound quite logical.
I wouldn't feel too bad. I've lost track of the number of logical explanations I have had for problems that I have had to throw away once I had more data. To be honest, I'm expecting this to carry on failing intermittently and that the root cause - when we find it - will turn out to be my refactoring. > BTW, the testsuite failed but it wasn't *that*. What a coincidence ! Indeed. I saw that and started to mentally draft a "Ignore my last" email while I scrolled to the end to see which test had failed. Mark > > Rémy > > >> >> I have tried to recreate the issue locally without success. I was able >> to recreate it occasionally running the tests on silvanus.a.o (the CI >> machine that runs all our buildbot jobs). >> >> I captured a network trace that confirmed that this was a server side >> bug. What I saw was a corrupted response. The headers and first chunk >> were correct but rather than the 5 bytes of the end chunk I saw the >> other 8187 (8192-5) bytes of the buffer. It was clear the buffer was >> configured for write when it was being read. >> >> I then tried to figure out how this could happen with a view to >> reproducing the issue. >> >> There were a lot of dead ends during which I noticed that the write >> pattern varied when I added additional debug statements. I discovered >> that, depending on timing, the NIO2 endpoint would sometimes use a >> gathering write when performing a non-blocking flush. >> >> There is a non-blocking flush just before the switch back to blocking >> I/O (after the dispatch to end the async component) and it looked to be >> possible that the gathering write could still be in progress when the >> following blocking write was performed. That in turn meant that one of >> the buffers used by the gathering write could be modified during the >> following blocking write. >> >> However, my current understanding of the code is that the gathering >> write will have written all the data from the buffer that is used by the >> following blocking write before that blocking write occurs. So I may >> have missed the root cause completely. It depends a lot on the internal >> workings of the AsynchronousSocketChannel. >> >> On balance, I decided to commit this fix as there does appear to be a >> bug here. Hopefully, it is the root cause of the intermittent >> TestCoyoteOutputStream failures. If it is, great. If not, I'll keep >> looking. >> >> Mark >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: dev-h...@tomcat.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org