Re: [clug-talk] nasty problem

Robert Lewko Thu, 18 May 2006 06:49:11 -0700

On 5/18/06, Gustin Johnson <[EMAIL PROTECTED] > wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Why are you using UDP instead of TCP?  Since your app is going over
satellite, and latency is not as much of an issue as reliability, TCP
seems better suited.  If a UDP packet is lost on the wire, there is no
mechanism to help you isolate the problem.  It is possible that your app
is being filtered.  You could try to change the port, but I would
seriously consider TCP.  Keep in mind I am a network admin and not a
developer, you may have a valid reason for using UDP, it is just S.E.P.
from my point of view.

The reason that I used UDP is that no fork/exec/new socket is used. First you have to understand its not the latency that makes satellite communication hard - although without altering any socket options TCP will detect the latency and start to back off.

What happens is that you have two things to worry about. On the client end you have to worry about losing the connection and reconnecting - ie. one satellite passes out of LOS (line of site) and it can be up to 20-25 minutes before the next one is in site but the average is 10 minutes without service. When the next satellite you have a 2-3 minute period when you may have very sporadic network availablility. You may have a 6 second period with network availability, just enough time to dial and get a connection without getting data through. Once the next satellite gets in sight you can have 90 minutes to 2 hours with good service.

So lets consider what it would look like if we used TCP. This application gets a file in a directory (not my design in this part) every 5 minutes. It parses the file, puts it in packets then sends it to the server. So it does a write to a TCP socket and gets an error: "No route to host". So it closes the socket and calls the windoze shit that dials the network through the modem. Great! now you have a connection. OK you are in one of those 6 second spots of connectivity at the start of getting a new satellite. So you start to send a 4k packet at 9600 baud. Do you see a problem? You won't get your packet sent before the network goes down. Remember the 3 way handshake that TCP needs to use before a connection is made. Well that uses up about 3 seconds right there. Using UDP you can actually get 2 2k packets through with their ack returned in 6 seconds. BTW I have restricted myself to a 2k packet size in my program.

What's happening on the server? There are two things you could do: construct a single threaded server or one that uses fork/exec. They each exhibit a different form of the same problem. The server accepts a new connection. The accept system call receives a new connection on the listening port, dups the connection on a free port and assigns a new fd for that new connection.

The single threaded server will use fd's and the fork/exec server will use slots in the process table. To make that clearer the single threaded server will get activity that indicates a new connection, call accept to get a new fd to communicate with the new connection and manage that new connection in the next select call. The concurrent server (one process per connection) will wait for accept to return a new fd for a new connection, then it will fork/exec to make a new process to handle that connection.

OK so data comes into that fd for a while until the client gets a broken connection. At that point the server socket will wait for hours, literally indefinitely for more data on that fd. So now you have to put a timer on each fd/process so you can detect when no data has been received for whatever timeout period that you decide to use (what do you use for the timeout period?). Keep in mind that 2-3 minute period can generate 10-12 broken connections. So depending on which server design used there will either 10-12 unused fd's that have no client or 10-12 processes that are there listening with no client to give them data. Also know that there are possible 10 to 12 mobile systems doing that and now you have the possibility of 100 to 150 unused resources that need to be cleaned up and that each process has a maximum number of fd's and a maximum number of processes that can run. So, what if they bought another company with a similar number of trucks or another company 10 times larger bought them 'cause of the way that they do "real time" testing? Instant problem!

This whole discussion is based on that when I get a broken connection when the client sends some data that there is no way to tell the socket that it can try again. If someone knows how to do that and can point me to docs then I will be glad of the info. In my reading of Stevens I didn't see how to recover from a broken connection.

Using UDP just side steps these issues. You put the responsibility for the communication on the client. The client is the one that detects when the packet has not been sent by putting a sequence/timestamp in each packet and comparing that tuple to each packet that is returned. If the seq/ts does not match the one you are looking for then dump it. When it does match you can process the packet and transmit the next one (there are more efficient ways of handling multiple packets, but I'm keeping it simple).

The UDP server can be MUCH simpler by handling each packet as a self contained entity. It gets a packet from the client, processes that packet, then uses the source addresss as the destination for the return packet. No fork, no exec, no accept/new fd, no cleanup. With UDP you can have one process that deals with one packet at a time. What you have to ensure is that the server does not get busy enough that clients will get thier response before the end of the retransmit delay.

_______________________________________________
clug-talk mailing list
[email protected]
http://clug.ca/mailman/listinfo/clug-talk_clug.ca
Mailing List Guidelines (http://clug.ca/ml_guidelines.php)
**Please remove these lines when replying

Re: [clug-talk] nasty problem

Reply via email to