On 25/02/2015 14:31, Rémy Maucherat wrote: > 2015-02-24 16:33 GMT+01:00 Mark Thomas <ma...@apache.org>: > >> On 24/02/2015 13:10, Rémy Maucherat wrote: >>> I'm having issues with the write timeout tests in >>> TestWsWebSocketContainer, which made me do some changes since there are >>> still things I don't understand: >> >> These appear to be OK for me at the moment with NIO and NIO2 but the >> very nature of timing issues means that doesn't count for much. I am >> seeing failures or crashes with APR/native so there is still work to be >> done there. >> >>> - In WsRemoteEndpointImplServer, onWritePossible appears to be able to be >>> invoked concurrently (doWrite calls it directly and changes the >> buffers). I >>> think it should be synced. >> >> Those calls should be nested. If you are seeing concurrent calls then >> there is probably still an issue around write registration. >> > > I still think there is concurrency there, at least with the first write > notification (which is concurrent if the first read does write immediately, > just like our "big" failing test does). Without the read/write concurrency, > I think there wouldn't be any issue. > > With the TestWebSocketFrameClient failure, the contending traces look like > (I used a semaphore to isolate them): > [junit] java.lang.Exception: Stack trace > [junit] at java.lang.Thread.dumpStack(Thread.java:1329) > [junit] at > org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.onWritePossible(WsRemoteEndpointImplServer.java:146) > [junit] at > org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:87) > [junit] at > org.apache.tomcat.websocket.WsRemoteEndpointImplBase$OutputBufferSendHandler.write(WsRemoteEndpointImplBase.java:822) > [junit] at > org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:447) > [junit] at > org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:338) > [junit] at > org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.write(WsRemoteEndpointImplBase.java:730) > [junit] at > org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:250) > [junit] at > org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendString(WsRemoteEndpointImplBase.java:193) > [junit] at > org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:37) > [junit] at > org.apache.tomcat.websocket.TesterFirehoseServer$Endpoint.onMessage(TesterFirehoseServer.java:121) > [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [junit] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [junit] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit] at java.lang.reflect.Method.invoke(Method.java:483) > [junit] at > org.apache.tomcat.websocket.pojo.PojoMessageHandlerWholeBase.onMessage(PojoMessageHandlerWholeBase.java:80) > [junit] at > org.apache.tomcat.websocket.WsFrameBase.sendMessageText(WsFrameBase.java:393) > [junit] at > org.apache.tomcat.websocket.WsFrameBase.processDataText(WsFrameBase.java:494) > [junit] at > org.apache.tomcat.websocket.WsFrameBase.processData(WsFrameBase.java:289) > [junit] at > org.apache.tomcat.websocket.WsFrameBase.processInputBuffer(WsFrameBase.java:130) > [junit] at > org.apache.tomcat.websocket.server.WsFrameServer.onDataAvailable(WsFrameServer.java:56) > [junit] at > org.apache.tomcat.websocket.server.WsHttpUpgradeHandler$WsReadListener.onDataAvailable(WsHttpUpgradeHandler.java:207) > [junit] at > org.apache.coyote.http11.upgrade.UpgradeServletInputStream.onDataAvailable(UpgradeServletInputStream.java:213) > [junit] at > org.apache.coyote.http11.upgrade.UpgradeProcessor.upgradeDispatch(UpgradeProcessor.java:108) > [junit] at > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:658) > [junit] at > org.apache.coyote.http11.Http11Nio2Protocol$Http11ConnectionHandler.process(Http11Nio2Protocol.java:130) > [junit] at > org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.doRun(Nio2Endpoint.java:1694) > [junit] at > org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.run(Nio2Endpoint.java:1653) > [junit] at > org.apache.tomcat.util.net.Nio2Endpoint.processSocket0(Nio2Endpoint.java:578) > [junit] at > org.apache.tomcat.util.net.Nio2Endpoint.processSocket(Nio2Endpoint.java:563) > [junit] at > org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$3.completed(Nio2Endpoint.java:794) > [junit] at > org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$3.completed(Nio2Endpoint.java:775) > [junit] at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126) > [junit] at sun.nio.ch.Invoker$2.run(Invoker.java:218) > [junit] at > sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112) > [junit] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [junit] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [junit] at > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) > [junit] at java.lang.Thread.run(Thread.java:745) > > [junit] java.lang.Exception: Stack trace > [junit] at java.lang.Thread.dumpStack(Thread.java:1329) > [junit] at > org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.onWritePossible(WsRemoteEndpointImplServer.java:146) > [junit] at > org.apache.tomcat.websocket.server.WsHttpUpgradeHandler$WsWriteListener.onWritePossible(WsHttpUpgradeHandler.java:252) > [junit] at > org.apache.coyote.http11.upgrade.UpgradeServletOutputStream.onWritePossible(UpgradeServletOutputStream.java:252) > [junit] at > org.apache.coyote.http11.upgrade.UpgradeProcessor.upgradeDispatch(UpgradeProcessor.java:110) > [junit] at > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:658) > [junit] at > org.apache.coyote.http11.Http11Nio2Protocol$Http11ConnectionHandler.process(Http11Nio2Protocol.java:130) > [junit] at > org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.doRun(Nio2Endpoint.java:1694) > [junit] at > org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.run(Nio2Endpoint.java:1649) > [junit] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [junit] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [junit] at > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) > [junit] at java.lang.Thread.run(Thread.java:745) > > >>> - In Nio2Endpoint socket wrapper uses nestedWriteCompletionCount over the >>> inline flag that was used in 8. If the write completes inline, then >> isReady >>> should already be set back to true, and writing could continue. So the >>> change was IMO adding more write notifications which could hide some >>> issues. I tried changing that many times following the refactoring >> started, >>> but this is the first time I can do it without obviously breaking the >>> testsuite (where some of the non blocking write tests would hang due to >>> missing write notifications). >> >> This change was to prevent multiple write threads being triggered if >> there were multiple levels of nesting with the write completion handler. >> It was a fairly rare event but it did happen. >> > > If it recurses, only one completion handler (the most nested one) will > notify. With the algorithm change, it would be the first one, which IMO > does not make any real difference but changes the timing of the call. So > not sure but it's not a priority.
ACK. I'll try and remember why I did it they way I did. >>> - NPE guards in the NIO connector socket processor for concurrent closing >>> [NIO2 has them, somehow it wasn't needed earlier in NIO, which is also an >>> odd thing; I actually feel better having to add them]. >>> >>> So this could improve on some possible timing related problems. I'll keep >>> on investigating though before committing anything. >> >> One thing to keep in mind that may simplify some of these issues is that >> once WebSocket moves to using the Tomcat I/O layer directly the >> requirement for one container thread reading and one container thread >> writing concurrently will go away. A number of the concurrency issues we >> have observed are triggered by these concurrent threads so switching >> back to a single thread should help. >> > Yes, IMO the read/write concurrency is causing problems. I was planning on waiting until the build was stable but given that: - read/write concurrency is at the root of a lot of these issues - only WebSocket should be using it now in trunk - the plan is to refactor WebSocket to remove it I'm going to go back to what I have in git, rebase it to current trunk and see where we are. If the unit tests pass on the usual platforms I'd be tempted to commit it. WDYT? Mark --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org