On 25/02/2015 14:31, Rémy Maucherat wrote:
> 2015-02-24 16:33 GMT+01:00 Mark Thomas <ma...@apache.org>:
> 
>> On 24/02/2015 13:10, Rémy Maucherat wrote:
>>> I'm having issues with the write timeout tests in
>>> TestWsWebSocketContainer, which made me do some changes since there are
>>> still things I don't understand:
>>
>> These appear to be OK for me at the moment with NIO and NIO2 but the
>> very nature of timing issues means that doesn't count for much. I am
>> seeing failures or crashes with APR/native so there is still work to be
>> done there.
>>
>>> - In WsRemoteEndpointImplServer, onWritePossible appears to be able to be
>>> invoked concurrently (doWrite calls it directly and changes the
>> buffers). I
>>> think it should be synced.
>>
>> Those calls should be nested. If you are seeing concurrent calls then
>> there is probably still an issue around write registration.
>>
> 
> I still think there is concurrency there, at least with the first write
> notification (which is concurrent if the first read does write immediately,
> just like our "big" failing test does). Without the read/write concurrency,
> I think there wouldn't be any issue.
> 
> With the TestWebSocketFrameClient failure, the contending traces look like
> (I used a semaphore to isolate them):
>     [junit] java.lang.Exception: Stack trace
>     [junit]     at java.lang.Thread.dumpStack(Thread.java:1329)
>     [junit]     at
> org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.onWritePossible(WsRemoteEndpointImplServer.java:146)
>     [junit]     at
> org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:87)
>     [junit]     at
> org.apache.tomcat.websocket.WsRemoteEndpointImplBase$OutputBufferSendHandler.write(WsRemoteEndpointImplBase.java:822)
>     [junit]     at
> org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:447)
>     [junit]     at
> org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:338)
>     [junit]     at
> org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.write(WsRemoteEndpointImplBase.java:730)
>     [junit]     at
> org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:250)
>     [junit]     at
> org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendString(WsRemoteEndpointImplBase.java:193)
>     [junit]     at
> org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:37)
>     [junit]     at
> org.apache.tomcat.websocket.TesterFirehoseServer$Endpoint.onMessage(TesterFirehoseServer.java:121)
>     [junit]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>     [junit]     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     [junit]     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     [junit]     at java.lang.reflect.Method.invoke(Method.java:483)
>     [junit]     at
> org.apache.tomcat.websocket.pojo.PojoMessageHandlerWholeBase.onMessage(PojoMessageHandlerWholeBase.java:80)
>     [junit]     at
> org.apache.tomcat.websocket.WsFrameBase.sendMessageText(WsFrameBase.java:393)
>     [junit]     at
> org.apache.tomcat.websocket.WsFrameBase.processDataText(WsFrameBase.java:494)
>     [junit]     at
> org.apache.tomcat.websocket.WsFrameBase.processData(WsFrameBase.java:289)
>     [junit]     at
> org.apache.tomcat.websocket.WsFrameBase.processInputBuffer(WsFrameBase.java:130)
>     [junit]     at
> org.apache.tomcat.websocket.server.WsFrameServer.onDataAvailable(WsFrameServer.java:56)
>     [junit]     at
> org.apache.tomcat.websocket.server.WsHttpUpgradeHandler$WsReadListener.onDataAvailable(WsHttpUpgradeHandler.java:207)
>     [junit]     at
> org.apache.coyote.http11.upgrade.UpgradeServletInputStream.onDataAvailable(UpgradeServletInputStream.java:213)
>     [junit]     at
> org.apache.coyote.http11.upgrade.UpgradeProcessor.upgradeDispatch(UpgradeProcessor.java:108)
>     [junit]     at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:658)
>     [junit]     at
> org.apache.coyote.http11.Http11Nio2Protocol$Http11ConnectionHandler.process(Http11Nio2Protocol.java:130)
>     [junit]     at
> org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.doRun(Nio2Endpoint.java:1694)
>     [junit]     at
> org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.run(Nio2Endpoint.java:1653)
>     [junit]     at
> org.apache.tomcat.util.net.Nio2Endpoint.processSocket0(Nio2Endpoint.java:578)
>     [junit]     at
> org.apache.tomcat.util.net.Nio2Endpoint.processSocket(Nio2Endpoint.java:563)
>     [junit]     at
> org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$3.completed(Nio2Endpoint.java:794)
>     [junit]     at
> org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$3.completed(Nio2Endpoint.java:775)
>     [junit]     at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
>     [junit]     at sun.nio.ch.Invoker$2.run(Invoker.java:218)
>     [junit]     at
> sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
>     [junit]     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     [junit]     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     [junit]     at
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>     [junit]     at java.lang.Thread.run(Thread.java:745)
> 
>     [junit] java.lang.Exception: Stack trace
>     [junit]     at java.lang.Thread.dumpStack(Thread.java:1329)
>     [junit]     at
> org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.onWritePossible(WsRemoteEndpointImplServer.java:146)
>     [junit]     at
> org.apache.tomcat.websocket.server.WsHttpUpgradeHandler$WsWriteListener.onWritePossible(WsHttpUpgradeHandler.java:252)
>     [junit]     at
> org.apache.coyote.http11.upgrade.UpgradeServletOutputStream.onWritePossible(UpgradeServletOutputStream.java:252)
>     [junit]     at
> org.apache.coyote.http11.upgrade.UpgradeProcessor.upgradeDispatch(UpgradeProcessor.java:110)
>     [junit]     at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:658)
>     [junit]     at
> org.apache.coyote.http11.Http11Nio2Protocol$Http11ConnectionHandler.process(Http11Nio2Protocol.java:130)
>     [junit]     at
> org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.doRun(Nio2Endpoint.java:1694)
>     [junit]     at
> org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.run(Nio2Endpoint.java:1649)
>     [junit]     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     [junit]     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     [junit]     at
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>     [junit]     at java.lang.Thread.run(Thread.java:745)
> 
> 
>>> - In Nio2Endpoint socket wrapper uses nestedWriteCompletionCount over the
>>> inline flag that was used in 8. If the write completes inline, then
>> isReady
>>> should already be set back to true, and writing could continue. So the
>>> change was IMO adding more write notifications which could hide some
>>> issues. I tried changing that many times following the refactoring
>> started,
>>> but this is the first time I can do it without obviously breaking the
>>> testsuite (where some of the non blocking write tests would hang due to
>>> missing write notifications).
>>
>> This change was to prevent multiple write threads being triggered if
>> there were multiple levels of nesting with the write completion handler.
>> It was a fairly rare event but it did happen.
>>
> 
> If it recurses, only one completion handler (the most nested one) will
> notify. With the algorithm change, it would be the first one, which IMO
> does not make any real difference but changes the timing of the call. So
> not sure but it's not a priority.

ACK. I'll try and remember why I did it they way I did.

>>> - NPE guards in the NIO connector socket processor for concurrent closing
>>> [NIO2 has them, somehow it wasn't needed earlier in NIO, which is also an
>>> odd thing; I actually feel better having to add them].
>>>
>>> So this could improve on some possible timing related problems. I'll keep
>>> on investigating though before committing anything.
>>
>> One thing to keep in mind that may simplify some of these issues is that
>> once WebSocket moves to using the Tomcat I/O layer directly the
>> requirement for one container thread reading and one container thread
>> writing concurrently will go away. A number of the concurrency issues we
>> have observed are triggered by these concurrent threads so switching
>> back to a single thread should help.
>>
> Yes, IMO the read/write concurrency is causing problems.

I was planning on waiting until the build was stable but given that:
- read/write concurrency is at the root of a lot of these issues
- only WebSocket should be using it now in trunk
- the plan is to refactor WebSocket to remove it

I'm going to go back to what I have in git, rebase it to current trunk
and see where we are. If the unit tests pass on the usual platforms I'd
be tempted to commit it. WDYT?

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to