I am using Jetty as a stand alone app server in a production system.  The 
primary use of the servers associated with this question is for handling 
WebSocket traffic.  We are using Jetty, because in my analysis, Jetty seems to 
be king in the Java world when it comes to a high performance implementation of 
non-blocking IO.  But... I have run across a strange error/behavior I can't 
explain.  I was hoping for any insight people might have.

Here is my best effort to describe the behavior without information overload.

When we have a solid # of Websockets open on a server (say 5-10K) and we send a 
message to all of them, say, every second (So that's a total of 5-10K messages 
going out each second). There seems to be a threshold where we start getting 
the following errors when calling connection.sendMessage(stringMsg);  where 
connection is a org.eclipse.jetty.websocket.WebSocket.Connection.

java.io.IOException: Write timeout 
or
java.nio.channels.ClosedChannelException

When Write timeout occurs, if you are familiar with this, you know this happens 
when the buffer in WebSocketGenerator is full and Jetty blocks while it writes 
to the channel.  If it blocks longer than MaxIdleTime it throws the Write 
timeout error.  So, this is the most annoying error because the thread that we 
have sending the message is used up for the whole length of MaxIdleTime.  The 
ClosedChannelException mostly happens within a few milliseconds, so its less 
annoying. 

Even more annoying is in both cases, we find calling connection.isOpen() will 
mostly return true even if these errors occur. 

Anybody have any ideas after the above explanation?

Here's some more info:
- We are using Jetty 7.6.4. 

- MaxIdleTime in our jetty.xml is 300 seconds, but we set MaxIdleTime for each 
connection onOpen to 60seconds.  So Write Timeout occurs after we wait 60 
seconds.

- When sending say 5000 msg/second out there are 0 errors and all 
connection.sendMessage() calls return in under 5 seconds and 99.99+% of them 
are 0-3ms

- When sending say 10000 msg/second, there are a small # of errors, say 1 a 
minute.  99.99+% of the messages are still 0 to a few ms. There are 3-5 
messages each minute that are 45 seconds or above. many of them above 60 
seconds and throw exception.  Note - if we change MaxIdleTime to 300 seconds, 
there is almost no decrease in # of exceptions, maybe a few. 

- If we get these exceptions, isOpen is still often open... so if we retry 
sending the message, the second try will also timeout.  So, WebSockets has just 
stopped.  So -- you can see, this is not any kind of linear build up just due 
to overloading. At 5000/second all WebSockets can run for hours and none of 
them will fail (note we are running these tests using servers and WebSockets 
client test machines that are in Amazon AWS - so the network is solid)... and 
then if we run at 10000/second, we will have failed WebSockets every minute, 
while 99.99% of all messages and 99% of all WebSockets will receive each 
message in just a few milliseconds.

- When we add more WebSockets or increase the message rate - we increase the # 
of test client machines, so we don't believe the problem is on the test client 
side.

- The CPU and Memory on the server is healthy during the test.  CPU runs 30-50% 
during 5000 or 10000 msg/second test.  Memory is low.  Most memory is our 
registry objects that hold the handle to the websocket.  10K of these 
websockets is nothing in terms of memory.  We are running on 4 core boxes with 
7GB or RAM


So from these details, does anybody have a good explanation of this kind of 
behavior?  Is this expected?  Any ideas on what we could tune to try and get 
rid of this problem? Or at least reduce the # of "hung websockets".

Thanks in advance for any ideas.
-L


   
_______________________________________________
jetty-users mailing list
[email protected]
https://dev.eclipse.org/mailman/listinfo/jetty-users

Reply via email to